Identification of loci associated with water use efficiency and symbiotic nitrogen fixation in soybean

Soybean (Glycine max) production is greatly affected by persistent and/or intermittent droughts in rainfed soybean-growing regions worldwide. Symbiotic N2 fixation (SNF) in soybean can also be significantly hampered even under moderate drought stress. The objective of this study was to identify genomic regions associated with shoot carbon isotope ratio (δ13C) as a surrogate measure for water use efficiency (WUE), nitrogen isotope ratio (δ15N) to assess relative SNF, N concentration ([N]), and carbon/nitrogen ratio (C/N). Genome-wide association mapping was performed with 105 genotypes and approximately 4 million single-nucleotide polymorphism markers derived from whole-genome resequencing information. A total of 11, 21, 22, and 22 genomic loci associated with δ13C, δ15N, [N], and C/N, respectively, were identified in two environments. Nine of these 76 loci were stable across environments, as they were detected in both environments. In addition to the 62 novel loci identified, 14 loci aligned with previously reported quantitative trait loci for different C and N traits related to drought, WUE, and N2 fixation in soybean. A total of 58 Glyma gene models encoding for different genes related to the four traits were identified in the vicinity of the genomic loci.


Introduction
Drought limits soybean [Glycine max (L.) Merr.] productivity by affecting diverse processes during reproductive (Sinha et al., 2021) and vegetative growth, including photosynthesis and biological nitrogen fixation (Kunert et al., 2016;Martynenko et al., 2016;Zhu et al., 2016;Dubey et al., 2019).Limitations in water availability to sustain crop growth and grain yield are expected to increase further in many regions of the world (Luo et al., 2019), amplifying the urgency to develop varieties with superior performance under water deficit stress conditions.However, given low heritability and complex scenarios such as timing, duration, and intensity of water deficit stress imposition, selecting for increased yield or maintenance of yield under drought stress is challenging for breeding programs (Bhat et al., 2016).Exploration of the genetics underpinning physiological traits that influence and/or are responsive to water deficit stress can provide insights that can be leveraged to develop more drought-tolerant cultivars (Sallam et al., 2019;Xiong et al., 2021).
Symbiotic N fixation provides an important advantage for soybean production compared to other grain crops, as it eliminates the need for N fertilization in most production environments under standard management practices.Up to 94% of the total N requirement by the plant can be acquired through SNF in N-deficient soil (Harper, 1987).However, despite the significant capacity for SNF, N availability can limit soybean yields under a range of environmental conditions, including drought (King and Purcell, 2001;Purcell et al., 2004;King and Purcell, 2006;King et al., 2014;Purcell, 2015), high soil temperature (Lindemann and Ham, 1979;Montañez et al., 1995), flooding stress (Pasley et al., 2020), and soil salinity (Elsheikh and Wood, 1995).SNF is highly sensitive to drought stress and this sensitivity can result in significant yield reductions under water-limiting conditions (Durand et al., 1987;Sinclair et al., 1987;Sall and Sinclair, 1991;Purcell et al., 1997).Sinclair et al. (2010) modeled the impacts of changes in selected morpho-physiological traits on soybean yields across much of the U.S. soybean production region and found that reducing SNF sensitivity to drought would result in the greatest yield benefit among the examined traits.These and other studies highlight the need and potential for improvements in SNF to enhance soybean productivity.
Genotypic variation in SNF under normal and water-limited conditions has been well documented, revealing opportunities to improve soybean by selecting genotypes with enhanced SNF and, in particular, genotypes capable of sustained SNF under water-limited conditions (King et al., 2014).This is exemplified by the success of Chen et al. (2007), who developed soybean varieties with enhanced drought tolerance by breeding with genotypes that maintain high SNF under water-limited conditions.To better understand the genetics and facilitate breeding for elevated and sustained SNF, several research groups have conducted studies to identify quantitative trait loci (QTLs) for SNF and/or SNF-associated traits in numerous legumes (e.g., Heilig et al., 2017;Kamfwa et al., 2019;, Bourion et al., 2010;Ohlson et al., 2018).In soybean, studies have explored QTL for several nodulation traits (Santos et al., 2013;Hwang et al., 2014;Yang et al., 2019;Zhu et al., 2019;Ni et al., 2022), ureide accumulation (Hwang et al., 2013;Ray et al., 2015a), N isotope ratio (d 15 N), (Steketee et al., 2019;Bazzer et al., 2020c), and percent N derived from the atmosphere (Ndfa) (Dhanapal et al., 2015b;Liyanage et al., 2023).
d 15 N is a measure of the relative abundance of 14 N and 15 N isotopes in plant tissue relative to their relationship in air and is expressed in ‰.Because soils have a higher 15 N abundance than air, SNF enriches 14 N relative to 15 N in tissues of N-fixing plants, which consequently exhibit a lower d 15 N compared to plants that rely on soil mineral N uptake for their N supply (Shearer and Kohl, 1986).Therefore, d 15 N can be used directly to compare the relative SNF of different genotypes (Steketee et al., 2019;Bazzer et al., 2020c) and to calculate the percent Ndfa when a genotype that depends only on soil mineral N as a N source is used as a reference (Kohl and Shearer, 1981).An advantage of the N stable isotope technique is that it can be used to assess large populations grown under field conditions and provides an integrated signal of relative SNF over the course of plant growth.
It is well established that leaf N content is closely related to net photosynthesis, N uptake is closely related to soybean yield, and N availability can limit soybean yields in a broad range of environments (Sinclair and Horie, 1989;Rotundo et al., 2014;Cafaro La Menza et al., 2017;Cafaro La Menza et al., 2020).Pazdernik et al. (1997) and Rotundo et al. (2014) showed that yields of elite soybean cultivars were more closely related with total N uptake (determined based on shoot analysis) than with total shoot biomass.As one of the factors in determining total N uptake, shoot tissue [N] concentration is, thus, of great relevance.Previously, King and Purcell (2006) showed that genotypes with inherently lower shoot [N] were able to maintain SNF compared to genotypes with inherently higher shoot [N].Of course, N assimilation is intimately intertwined with C assimilation, and CN-dynamics are closely regulated by complex mechanisms (Djekoun and Planchon, 1991).In soybean, these complex interactions also extend to SNF, which, for example, is more sensitive to drought than net photosynthesis (Djekoun and Planchon, 1991), and partial recovery from prolonged drought is slower for SNF than for photosynthesis (Djekoun and Planchon, 1991).Thus, the characterization of genotypic variation and mapping of shoot [N] and shoot carbon nitrogen ratio (C/N) ratio in conjunction with SNF and WUE can provide valuable information for soybean germplasm improvement.Indeed, in addition to studies mentioned above that have identified loci for d 13 C and d 15 N or Ndfa, previous research has also identified markers for soybean shoot [N] and/or C/N ratio (Hwang et al., 2013;Dhanapal et al., 2015b;Steketee et al., 2019).
d 13 C (WUE), d 15 N (SNF), [N], and C/N ratio are quantitative in nature and controlled by many genes with minor effects (Adiredjo et al., 2014;Dhanapal et al., 2015a;Dhanapal et al., 2015b;Kaler et al., 2017).Molecular markers and QTL associated with these traits can facilitate marker-assisted selection and genomic prediction and accelerate the breeding process.Previous soybean genome-wide association (GWA) studies identified genomic regions and markers associated with d 13 C, d 15 N, [N], and C/N ratio based on a limited number of molecular markers [12,347 SNP by Dhanapal et al. (2015a);31,145 SNP by Dhanapal et al. (2015b); 31,260 SNP by (Kaler et al. (2017);and 35,262 SNP by (Steketee et al. (2019)] derived from the SoySNP50K iSelect Beadchip (Song et al., 2013).In the current study, approximately 4.1 million SNP markers derived from whole-genome resequencing of 105 genotypes were used to conduct GWA mapping for d 13 C, d 15 N, [N], and C/N ratio.Whole-genome resequencing allows the detection of a large number of molecular markers, such as SNPs and Insertion-Deletions (Indels) in crop species (Xu and Bai, 2015;Goodwin et al., 2016;Torkamaneh et al., 2018), and high genomewide marker density increases the precision of marker-trait associations (Huang et al., 2009;Xu et al., 2013;Xu and Bai, 2015).The main objective of this study was to utilize the highresolution marker density derived from whole-genome resequencing to identify molecular markers and genomic regions associated with d 13 C, d 15 N, [N] and C/N traits in a diversity panel consisting of 105 soybean genotypes and to explore these regions for candidate genes that may control these traits.

Plant material and field experiments
Genotypes included in this study were selected because of available whole-genome resequencing data and the desired focus on maturity group (MG) II and III germplasm.The 105 diverse genotypes (Supplementary Table S1) were composed of 41 in MG II, 60 in MG III, and four in MG IV.Seeds, originally obtained from the United States Department of Agriculture (USDA) Soybean Germplasm Collection, were increased in 2013.Experiments were conducted at the Bradford Research Center near Columbia, Missouri, on a Mexico silt loam (fine, smectitic, mesic Vertic Epiaqualfs) soil.Seeds were sown on 6 May 2014 and on 5 May 2016 at a density of 34.4 seeds m −2 in 3-m long single-row plots that were spaced 0.76-m apart.The experiment established in 2015 had to be terminated due to inadequate emergence caused by adverse environmental conditions.Entries were arranged in a randomized complete block design with three replications.Due to the long history of soybean production at that location, inoculation with Bradyrhizobium diazoefficiens was omitted.Weeds were controlled with pre-(Dual II Magnum; S-metolachlor at 1.68 kg ha −1 ) and post-emergence (Basagran, Sodium salt of bentazon at 0.45 kg ha −1 ai., and Fusilade DX, Fluazifop-P-butyl at 0.425 kg ha −1 ai.) herbicide applications, which were complemented with manual weeding.Experiments were conducted under rainfed conditions without supplemental irrigation.Rainfall and temperature data were recorded by a weather station located within 500 m of the field sites and downloaded from the Missouri Historical Agricultural Weather Database (http://agebb.missouri.edu/weather/history/).

Phenotyping for stable isotope and C and N concentrations
Shoot biomass of five representative plants from each plot was harvested at the beginning of bloom to full bloom [R1 to R2; (Fehr et al., 1971)] stages in both years.Due to poor stand densities of 10 entries, only samples from 95 genotypes were collected in 2016.Samples were dried in a forced-air drier at 60°C until constant weight.Dry samples were ground (Thomas Model 4 R Mill, Thomas Scientific, NJ, USA) to pass a 2-mm screen, subsampled and processed with a cyclone mill (UDY Corporation, CO, USA) to pass a 1-mm screen, and in a third step, ball milled in 15-ml tubes with a 9.52-mm stainless steel ball (440C Stainless Steel Ball, Abbott Ball Company, Inc., CT, USA) in a Geno/Grinder (SPEX CeertiPrep, Inc., NJ, USA).Three milligram of the ball-milled sample from each plot was loaded into tin capsules (Costech Analytical Technologies Inc., CA, USA) and sent to the University of California Stable Isotope Facility in Davis, CA (https://stableisotopefacility.ucdavis.edu/carbon-and-nitrogensolids)for analysis with an isotope ratio mass spectrometer (IsoPrime, Elementar France) coupled to an elemental analyzer (EA3000, EuroVector).The d 13 C and d 15 N of each sample were expressed relative to the international standards V-PDB (Vienna Pee Dee Belemnite) and air, respectively, according to the following equations (O'Leary, 1981;Mariotti, 1983;Sharp, 2017): Carbon isotope ratio: where R is the ratio of the heavy to the light isotope ( 13 CO 2 / 12 CO 2 ) and R x and R VPDB are the isotope ratios of the sample and the standard, respectively.
Nitrogen isotope ratio: where, R is the ratio of the heavy to the light isotope ( 15 N/ 14 N) and R x and R airN2 are the isotope ratios of the sample and the standard, respectively.
Carbon and N concentrations were determined as part of the stable isotope analysis with the elemental analyzer (EA3000, EuroVector).The C and N concentrations of each sample were expressed as mg C or N g −1 dry weight.Sample C to N ratios (C/N) was calculated by dividing the C concentration by the N concentration.

Statistical analysis of phenotypic data
The two years, 2014 and 2016, were treated as two different environments.Analysis of variance (ANOVA) was performed in SAS 9.4 (SAS Institute Inc., Cary, NC, USA) for each year and across the two years using the PROC MIXED procedure.In individual-environment ANOVA, genotype was treated as a fixed effect, and replications within an environment were treated as random (Bondari, 2003).In across-environment ANOVA, all factors were treated as fixed effects, except replications within environments that were considered random effects (Bondari, 2003).The Pearson correlation coefficient between the traits within and across the environments was calculated in SAS 9.4 using PROC CORR.Entry-mean basis broad sense heritabilities were calculated as H 2 = s 2 g /[s 2 g + s 2 ge /k + s 2 e /(rk)], where s 2 g is the genotypic variance, s 2 ge is the genotype by environment interaction variance, k is the number of environments, r is the number of replications (Holland et al., 2003).The variance components were estimated using the PROC VARCOMP procedure in SAS 9.4 with the restricted maximum likelihood estimation method following Holland et al. (2003).
The best linear unbiased predictions (BLUP) were calculated for each trait for each year using PROC MIXED functions in SAS 9.4 (SAS Institute Inc., USA), where all factors were treated as random factors (Robinson, 1991;Piepho et al., 2008).Calculated BLUP values were used for GWA mapping.

Genotypic data, SNP identification, and filtering
Whole-genome resequencing (15X and 40X coverage depending on genotype) was conducted by others, and the information was deposited in Soykb [Liu et al. (2016b); http:// soykb.org/NGS_Resequence/NGS_index.php].SNPs were identified following the PGen workflow described by (Liu et al., 2016b).A total of 11,972,496 raw SNP markers were obtained for the 105 genotypes.Monomorphic markers and markers with 50% missing data were filtered out, and missing data for the remaining markers were imputed using BEAGLE 5.1 with default parameters (Browning et al., 2018).Finally, markers with less than 5% minor allele frequency were filtered out, resulting in 4,108,002 SNP that were used for GWAM.

Genome-wide association analysis
GWA mapping was performed by implementing the Fixed and Random Model Circulating Probability Unification (FarmCPU) model in R (Liu et al., 2016a).FarmCPU performs multi-locus mixed model in two parts, the fixed effect model and random effect model, and uses them iteratively until there is no change between them in terms of identified markers.Principle component analysis was conducted using PLINK 1.9 (Purcell et al., 2007) with a subset of 50,000 LD pruned markers.PCs that cumulatively explain 25% of the total variation of the population were added as covariates in the FarmCPU model to control for the population structure.
As the Bonferroni p-value threshold to declare a marker significantly can be too stringent and exclude some true associations resulting in false negatives (Moghaddam et al., 2016;Arifuzzaman et al., 2019;Kaler et al., 2020), the p-value thresholds determined by using "FarmCPU p-value threshold" function with 100 permutations were used.This function breaks the relationships of the phenotypes with the genotypes through permutation and suggests a suitable p-value threshold for the respective trait (Liu et al., 2016a).As the phenotypic distribution of each trait differs, the FarmCPU p-value threshold function generates different p-value thresholds for different traits.In addition to all markers meeting the stringent cutoff p-value thresholds determined by the FarmCPU threshold function, results were also inspected for markers that passed a -log (p-value) threshold of 3.5 for each trait.Markers identified based on this threshold were considered significant either if they were detected in both years or detected in one of the two years and were within 1.5 Mb upstream or downstream of a locus for the respective trait identified in the previous studies (Dhanapal et al., 2015a;Dhanapal et al., 2015b;Kaler et al., 2017;Steketee et al., 2019;Bazzer et al., 2020a;Bazzer et al., 2020b;Bazzer et al., 2020c).
Stepwise regression procedure was implemented in R statistical software (function "step") to identify the minimal number of markers independently associated with a particular trait in each year (Meyer et al., 2013;Gurung et al., 2014;Mamidi et al., 2014;Delaneau et al., 2017).The stepwise regression procedure accounts for QTL × QTL interactions and retains only the major and independent QTL (Delaneau et al., 2017).Significant markers with stringent cutoffs and those identified in both years with a cutoff of -log (p) = 3.5 were included in the stepwise procedure.In this procedure, both the marker and the model needed to be significant at P < 0.05 for the stepwise inclusion of a marker.Genome-wide LD blocks were estimated using the "BigLD" function in the gpart R package (Kim et al., 2019) with an LD cutoff of 0.7.If multiple significant markers were found within the same LD block, they were tagged as one locus.Otherwise, single markers represent the anchoring markers of the respective LD region.

Candidate genes
All gene models within the LD block of each significant marker were extracted from G. max genome assembly version Glyma.Wm82.a2.v1 (https://soybase.org).To determine the gene annotations and Gene Ontology (GO) functions, extracted Glyma gene models were blasted against the TAIR 10 protein database (https://www.arabidopsis.org).Candidate genes were identified based on two methods: keyword searches and allele frequencies among extreme genotypes.
To identify candidate genes based on allele frequencies among extreme genotypes, 20 genotypes consisting of two groups of 10 genotypes from each phenotypic tail were selected for a given trait in each environment.Then, all SNPs within the LD region of a significant locus were isolated for the designated set of 20 genotypes.In the next step, the reference and alternate alleles among the two extreme groups of genotypes were identified, and the allele frequencies of an SNP for each group were calculated.The SNPs for which the reference allele frequency in one group was ≥ 80%, and the alternate allele frequency in the contrasting group was ≥ 80% were selected for candidate gene searches.Selected SNPs with these contrasting allele combinations were pursued if they were located within a gene model.
For the candidate genes identified by the above methods, the literature was explored for further information.In addition, the predicted effect of polymorphism on the gene models identified by both methods mentioned above was determined using SnpEff (v4.0) (Cingolani et al., 2012).The SNPeff score was calculated for each gene model based on the weighted sum of three main categories of variants (SNPeff_score = HIGH × 20 + MODERATE × 10 + LOW × 1) (Cingolani et al., 2012;Lovell et al., 2021).The description of the variant categories, "HIGH," "MODERATE," and "LOW" can be found at https://pcingola.github.io/SnpEff/se_inputoutput/#effectprediction-details.

Phenotypic diversity, relationships, and heritabilities
ANOVA revealed highly significant (P < 0.0001) genotype and genotype x environment interaction effects for d 13 C, [N], and C/N ratio.The environment effect was also significant for d 13 C (P < 0.0001), [N] (P < 0.01), and C/N (P < 0.01).For d 15 N, genotype and environment effects were highly significant (P < 0.0001 and P < 0.001), but the genotype x environment interaction effect was marginal (P = 0.08).The observed environment and genotype x environment interaction effects were consistent with the considerable differences in environmental conditions between the two years.Although mean daily temperatures from planting to sampling were similar in the two years (21.3°C vs. 22.2°C), temperatures during early growth were higher (18.6°C vs. 16.9°C), and those later in the season were lower (June: 23.1°C vs. 24.7°C;July: 22.3°C vs. 24.9°C) in the first than the second year (Supplementary Table S2).Likely of particular importance for the observed environment and genotype x environment interaction effects was the much lower precipitation from planting to sampling in 2014 (273 mm) than in 2016 (384 mm), which primarily resulted from the much lower precipitation in July (41 mm vs. 274 mm), and despite June precipitation being greater in the first than the second year (164 mm vs. 29 mm).Genotypic effects of the within-environment ANOVA were highly significant (P < 0.0001) for all traits in both environments except for d 15 N in 2014, where the genotype effect was significant at P < 0.05 level (Table 1).
Phenotypic values for all four traits exhibited a broad range in both environments, with d 13 C ranging by 5.98 ‰ and 2.10 ‰, d 15 N by 8.51‰ and 6.61 ‰, [N] by 22.01 mg g −1 and 17.16 mg g −1 , and C/N by 10.36 and 17.37, in 2014 and 2016, respectively (Table 1 and Figure 1).The greater ranges observed in 2014 for d 13 C, d 15 N, and [N] also were associated with greater mean values (d 13 C −27.56 vs. −28.71‰; d 15 N 6.19 vs. 4.53 ‰; and [N] 30.14 vs. 24.98 mg g− 1 ).In contrast, the mean C/N ratio (14.31 vs. 17.44) and the range in C/N ratio (10.36 vs. 17.37) were smaller in 2014 than in 2016.
The considerable differences in environmental conditions were also reflected in the relatively weak correlations of phenotypic data between environments, namely, r = 0.29 (P < 0.01) for d 13 C, r = 0.20 (P < 0.06) for d 15 N, r = 0.28 (P < 0.01) for [N], and r = 0.32 (P < 0.01) for C/N ratio (Table 2).Not surprisingly, strong negative correlations between the C/N ratio and [N] were observed in each environment and across both years (r = −0.93 or −0.97; P < 0.001).In contrast, relationships were not consistent among other traits when analyzed by environment, but, when analyzed across environments, the C/N ratio and d 13 C (r = −0.21;P < 0.05) as well as C/N ratio and d 15 N (r = −0.23;P < 0.05) were negatively correlated, and d 13 C and d 15 N (r = 0.21; P < 0.05) were positively correlated.Broad sense heritability estimates differed substantially between the traits and, to a lesser extent, between the two environments for a given trait.Heritability estimates for 2014 and 2016 were greatest for d 13 C (0.94 and 0.81), lowest for d 15 N (0.35 and 0.42), and intermediate for C/N ratio (0.85 and 0.67) and [N] (0.75 and 0.63).When combined across the two years, heritabilities were 0.58, 0.24, 0.39, and 0.44 for d 13 C, d 15 N, [N], and C/N ratio, respectively (Table 1).

SNP marker distribution and population structure
The 4,108,002 SNP markers retained after removing monomorphic markers, markers with 50% missing data, and markers with less than 5% minor allele frequency were widely distributed across the genome (Figure 2).The average marker density was 4,283 SNPs Mbp −1 , with the highest marker density on Gm18 (6,234 SNPs Mbp −1 ) and the lowest marker density on Gm05 (2,544 SNPs Mbp −1 ) (Supplementary Table S3).Within chromosomes, the distinct, typical pattern in SNP densities largely aligned with heterochromatic and euchromatic regions, with euchromatic regions on several chromosomes exceeding densities of 10K SNPs Mbp −1 (Figure 2).The average marker density in the euchromatic regions was 5690 SNPs Mbp −1 , compared to 3110 SNPs Mbp −1 in the heterochromatic regions (Supplementary Table S3).
The genotypes comprising the diversity panel were divided into eight subgroups based on principle component analysis (Figure 3).The subpopulations were not associated with any specific MG or location of origin.

Marker-trait associations
Given the significant differences between the two years, marker associations were identified based on BLUP values calculated for   S4], (2) a -log (p) value threshold of 3.5 in both environments or (3) a -log (p) value threshold of 3.5 in one of the two environments while also being within 1.5 Mbp of a previously identified locus documented in the literature.The SNPs identified in this manner were subjected to a stepwise regression to account for marker × marker interactions and retain only major and independent markers.The resulting SNPs and putative loci identified for each trait are listed in Table 3 and Supplementary Table S6.
Thirteen SNPs marking 11 putative loci were associated with d 13 C (Table 3 and Supplementary Table S6).A single locus each was identified on Gm01, Gm02, Gm04, Gm07, Gm08, Gm10, Gm13, Gm16, and Gm17, and two loci were identified on Gm14.Among these loci, the one on Gm13 was identified in both years.Except for      the locus on Gm02 marked by three SNPs, all loci were identified based on a single SNP.For d 15 N, 25 SNPs tagged 22 putative loci distributed across all chromosomes except Gm03, Gm09, Gm10, and Gm13 (Table 3 and Supplementary Table S6).Of the 22 putative loci, four (loci 3, 13, 17, and 22) were detected in both environments, whereas the other 18 loci were identified only in one of the two environments.Nineteen loci were marked by single SNPs, and three loci (1, 15, and 20) on Gm01, Gm17, and Gm19, respectively, were marked by two SNPs each.
Among the four traits, the largest number of SNPs, 35, was identified for [N] (Table 3 and Supplementary Table S6).These 35 SNPs tagged 21 putative loci, one of which, locus 10 on Gm06, was marked by 12 significant SNPs, and two others, locus 8 on Gm06 and locus 19 on Gm16, were marked by two SNPs.The 21 loci were located on 14 different chromosomes, with more than one locus identified on Gm03 (three loci), Gm04 (two loci), Gm06 (three loci), and Gm08 (three loci).One of the 21 loci, namely, locus 19 on Gm16, was detected in both environments.
For C/N ratio, 22 putative loci were tagged by 31 SNPs (Table 3 and Supplementary Table S6).The 22 putative loci were located on 15 chromosomes, including two loci on Gm04 and Gm16, three on Gm03, and four on Gm13.One of the loci, locus 5 on Gm04, was anchored by eight SNPs and was identified in both environments.Locus 9 on Gm07 and locus 19 on Gm16 were also identified in both environments and were marked by two SNPs each.

Phenotypes
Genotypic variation was significant for all traits (d 13 C, [N], d 15 N, and C/N) in both years (Table 1 and Figure 1).The genotypic variation in d 13 C is consistent with previous studies, which also documented significant genotypic variation.The estimated heritabilities for d 13 C of 0.94 and 0.81 in 2 years are relatively higher compared to the heritability estimates (0.59-0.72) for d 13 C in diversity panels consisting primarily of MGIV or MGVI-MGVIII soybean from the USDA soybean germplasm collection (Dhanapal et al., 2015a;Kaler et al., 2017;Steketee et al., 2019).Heritability estimates for [N] were moderate to high (0.75 and 0.63) and in agreement with those reported previously, which ranged from 0.37 to 0.73 (Dhanapal et al., 2015b;Steketee et al., 2019).For C/N, heritability estimates of 0.85 and 0.67 in the current study also exceeded the range of 0.37-0.59reported previously by (Dhanapal et al., 2015b).The lowest heritabilities among the four traits were estimated for d 15 N (0.35 and 0.42).The comparatively low heritabilities are consistent with Steketee et al. ( 2019) and Bazzer et al. (2020c), who also reported low to moderate heritability for d 15 N ranging from 0.07 to 0.40 in a GWAS and biparental mapping study, respectively.Low heritability for N 2 fixation was also documented by Dhanapal et al. (2015b) and Ray et al. (2015) based on N derived from the atmosphere (Ndfa) (0.15-0.26) and shoot ureide concentration (0.23-0.38), respectively.Low to moderate heritability for d 15 N in both years in the current study indicates considerable environmental influence on this trait, possibly due to spatial within-field variability in soil mineral N concentration (Steketee et al., 2019), which is well known to influence SNF in soybean (Evans, 2001;Donahue et al., 2020;Moro Rosso et al., 2023).Nonetheless, significant genotypic variation and heritability estimates indicate potential for genetic improvement for all four traits.
Relationships among the different traits generally varied between the 2 years, except for strong negative correlations (−0.97, p < 0.0001) and (−0.93, p < 0.0001) between shoot C/N ratio and shoot [N], which were expected (Table 2).These results suggest that environmental factors prevailing in the 2 years influenced traits to different extents.Few studies have compared the relationship between d 13 C and d 15 N or the other traits examined here in soybean.Steketee et al. (2019) reported a negative relationship between d 13 C and d 15 N and a positive correlation between d 13 C and N concentration of leaf tissue.In the present study, d 13 C and d 15 N of shoot biomass were positively correlated in 1 year but not the other.A positive correlation between these two phenotypes indicates that genotypes with higher WUE derived smaller amounts of N from SNF than those with lower WUE.However, the fact that these two traits were not correlated in the first year indicates that breeding for high WUE, as well as high N fixation, is not mutually exclusive.In fact, given the high sensitivity of soybean SNF to drought (Sinclair et al., 1987;Purcell et al., 2004;King et al., 2014), high WUE could be advantageous for SNF in some environments, which is suggested by the negative correlation between d 13 C and d 15 N that were observed by Steketee et al. (2019).This is also consistent with observations by Kumarasinghe et al. (1992), who reported a positive correlation between WUE efficiency and SNF based on C and N isotope measurements.Similar to the relationships between d 13 C and d 15 N, correlations between d 13 C and [N] and between d 13 C and C/ N were significant only in 1 year but not the other.Given the significant environment and genotype x environment interactions for these traits, differences between the 2 years with respect to the relationships between these traits were not surprising.

Population structure and marker-trait associations and candidate genes
Whole-genome resequencing data for all the genotypes used in this study provided much more abundant SNP markers (~4.1 million) to identify genomic regions associated with d 13 C, d 15 N, [N], and C/N ratio than previous studies that used genotypic data from the SoySNP50K iSelect SNP Beadchip (Dhanapal et al., 2015b;Kaler et al., 2017;Steketee et al., 2019).However, the goal of using SNPs derived from whole-genome resequencing data, coupled with the restriction based on maturity, resulted in a more limited number of entries for this study (105 entries) compared to previous GWAS studies (211-373 entries) examining these traits in soybean (Dhanapal et al., 2015a;Dhanapal et al., 2015b;Kaler et al., 2017;Steketee et al., 2019).Despite the lower number of entries than earlier GWA studies of these traits, the genetic diversity within the current panel was high and PC analysis used to assess population structure revealed eight subpopulations within the panel (Figure 3), which is similar to the six to nine subpopulations used by others for soybean GWA studies using whole-genome resequencing data or SoySNP50K iSelect SNP Beadchip data (Li et al., 2008;Dhanapal et al., 2015a;Dhanapal et al., 2015b;Contreras-Soto et al., 2017;Wang et al., 2018;Boudhrioua et al., 2020;Seck et al., 2020).In the current study, cutoff p-value thresholds to declare a marker-trait association significant were more stringent compared to other soybean GWA studies (Dhanapal et al., 2015a;Dhanapal et al., 2015b;Kaler et al., 2017;Zhang et al., 2018;Fang et al., 2020;Kaler et al., 2020;Song et al., 2020).Based on the applied thresholds, we identified a total of 104 significant marker-trait associations, which tagged 76 putative loci for the four traits.Although most of these loci were novel, several of them overlap or are located within close physical distances of loci previously reported for the four traits.

Carbon isotope ratio
GWA mapping of d 13 C identified a total of 11 loci, 10 of which were marked by single SNPs and one that was marked by three SNPs (Table 3 and Supplementary Table S6).Among these loci, six were novel, whereas five were closely located to previously identified markers for d 13 C by (Dhanapal et al., 2015a;Kaler et al., 2017;Steketee et al., 2019).These five loci (Table 3 and Supplementary Table S6) underscore the importance of these genomic regions relative to soybean WUE in different environmental conditions, and the application of orders of magnitude higher marker density than those used in previous studies may identify markers closer to candidate genes.Among the novel loci, locus 7 on Gm13 was identified in both years and had the greatest marker effect (−0.34, Table 3 and Supplementary Table S6) among the identified d 13 C loci.Exploration of the gene models in the vicinity of novel and confirmed loci revealed several genes with functions associated with stomatal morphogenesis and function (Table 3 and Supplementary Tables S7 and S8).Although not much is known about some of the gene models, for others, evidence that they affect plant responses to water availability, transpirational water loss, and/or WUE is available in the literature.A total of five candidate genes were identified at locus 7 based on keyword searches.Glyma.13G028200,encoding Photosystem II Reaction Center Protein A (PsbA), when overexpressed in maize (Zea mays), resulted in a much lower reduction in net photosynthesis and stomatal conductance than in wild-type plants under drought conditions (Huo et al., 2016).Another candidate gene, Glyma.13G030900,encoding NAC domain transcription factor ATAF1, is expressed in different tissues in Arabidopsis, including in stomatal guard cells (Lu et al., 2007).Wu et al. (2009) showed that ATAF1 overexpressing plants treated with ABA exhibited a greater extent of stomatal closure compared to ataf1 mutant and wild types, indicating ATAF1 plays a role in ABA-mediated stomatal closure to reduce water loss.SPEECHLESS (SPCH) (Glyma.13G040100),another locus-7 candidate gene, can control the stomatal number and density by initiating asymmetric cell divisions to establish stomatal lineage in Arabidopsis (MacAlister et al., 2006). Tripathi et al. (2016) showed that SPCH and its two other co-orthologues had reduced mRNA expression in drought-induced soybean leaves, resulting in reduced stomatal density (SD) and stomatal index compared to the untressed soybean leaves.
Locus 2 on Gm02, anchored by three SNPs, is particularly interesting as both Dhanapal et al., 2015a andKaler et al. (2017) reported QTL for d 13 C close to this locus.Additionally, Kumar and Lal (2015) also found one highly polymorphic SSR marker close to this locus after testing several SSR markers for polymorphism between soybean genotypes with varying CID.However, only one gene model (Glyma.02G113800)was identified as a hypothetical protein, and a GO annotation of "stomatal complex morphogenesis" was identified within this locus based on keyword searches and allele combinations.
Steketee et al. ( 2019) reported significant loci for d 13 C within 1.5 Mb upstream or downstream of loci 6, 8, and 10 of the current study.Although no candidate genes were identified in the vicinity of locus 6, five were identified at locus 9, and one at locus 10.Two of the genes at locus 9 encode a CBL-interacting protein kinase 2 (CIPK2; Glyma.14G202700) and a CBL-interacting protein kinase 11 (CIPK11; Glyma.14G203000).These genes are annotated with GO biological processes like stomatal movement and ABA signaling.Overexpression of CIPK2 in tobacco, Arabidopsis, and soybean has been reported to confer drought tolerance in control environment studies by modulating stomatal closure through ABA signaling under drought stress (Wang et al., 2016;Xu et al., 2021).On the other hand, overexpression of CIPK11 confers drought sensitivity in transgenic Arabidopsis (Ma et al., 2019).Further experiments by Ma et al. (2019) suggest that CIPK11-mediated plant response to drought tolerance is dependent on Di19-(dehydration-induced19-3), a transcriptional activator known to be involved in modulating plant response to different abiotic stresses, including drought stress.Another candidate gene at locus 9, Glyma.14G202900,encodes CHAL or EPFL-6, which is a homologue of Epidermal Patterning Factor-1 (EPF1) (Abrash and Bergmann, 2010).Overexpression of EPF1 in rice significantly reduces the SD and anatomical maximum stomatal conductance, leading to reduced water consumption and higher intrinsic WUE (Caine et al., 2019;Mohammed et al., 2019;Caine et al., 2023).The identified candidate gene models are targets for further studies that are needed to ascertain the extent of involvement, or lack thereof, of these candidate genes (Table 3 and Supplementary Tables S7 and  S8) in soybean WUE.

Nitrogen isotope ratio
The extent of SNF can be assessed under field conditions using N stable isotopes.Nitrogen isotope ratio (d 15 N) of plant tissue is commonly used to determine relative SNF expressed as %Ndfa.Nitrogen derived from the atmosphere is computed by relating the d 15 N value of a genotype to the d 15 N value of a non-nodulating reference line (Kohl et al., 1980).However, d 15 N can also be used directly to assess relative SNF if non-nodulating reference lines are unavailable because the d 15 N of the reference non-nodulated line is often constant across the field (Peoples et al., 2002;Steketee et al., 2019;Bazzer et al., 2020c).Here, association analysis based on d 15 N identified 22 loci on 16 different chromosomes.Four of these loci were detected in both environments, indicating the stability of these loci despite significant environmental effects between the environments (Table 3 and Supplementary Table S6).Additionally, three of the loci, including one of those identified in both environments (locus 13), marked genomic regions previously identified by Steketee et al. (2019).Interestingly, no candidate genes were identified in the genomic regions marked by these loci, neither based on keyword searches nor based on allelic combination analyses.However, four candidate genes were identified in the vicinity of novel d 15 N loci 4, 7, 8, and 21.Among the four gene models, Glyma.08G152700 at locus 8 is annotated as an AGAMOUS-like 26 (AGL-26) homologue, which, in Arabidopsis, is responsive to N deprivation and re-supply (Gan et al., 2005).AGAMOUS-like transcription factors also appear to be involved in regulating the symbiotic relationship between the plant and rhizobia in common bean (Phaseolus vulgaris) (Ayra et al., 2021).
Similarly intriguing is the potential role of Importin a nuclear transport receptors (Harreman and Adam, 2004), which may be involved in nodulation processes based on GO biological functions annotation.This suggests that the Importin a isoform 4 homologue identified at locus 21 may play a role in nodulation in soybean.Additional studies aimed at identifying the regulatory regions associated with the identified loci are critical to better understand and improve SNF in soybean.Of particular interest for follow-up, studies are the loci that were consistent across the environments as well as those that overlapped with previously identified regions of the soybean genome (Table 3 and Supplementary Table S6).Among these, loci 3 and 22 are highly significant as they both possess very high logarithm of odd ratio (Table 3 and Supplementary Table S6) in both environments compared to our cutoff values (Supplementary Table S4).These two loci also exhibit the largest marker effects compared to all other significant markers (Table 3 and Supplementary Table S6), but, interestingly, no candidate genes were identified in the vicinity of either of these loci.

N concentration
Twenty-one loci were detected for shoot [N] in this study; among them, loci 3 and 19 are located in the proximity of previously detected loci for leaf [N] by Steketee et al. (2019), one locus (17) coincided with a shoot [N] locus previously detected by Dhanapal et al. (2015b).Furthermore, loci 3, 9, 11, and 17 were near significant markers identified for shoot ureide concentration by Ray et al. (2015) (Table 3 and Supplementary Table S6).This is of interest because ureides (allantoin and allantoate) are the N-rich products from N 2 -fixation that are transported throughout the plant and may be used as indicators of the sensitivity of N 2 -fixation to drought stresses in common bean and soybean (Sinclair et al., 2000;Vadez and Sinclair, 2001;King and Purcell, 2005;Coleto et al., 2014).Even though based on analysis of [N] in different plant tissue (leaf vs. whole shoot;Steketee et al., 2019) or a different N-related trait (ureide concentration), the coincidence of loci between these studies is intriguing and warrants further attention.Therefore, the five loci associated with both shoot [N] and ureide concentration may be valuable to breed for sustained N 2 fixation under waterlimited conditions.
Several candidate genes were identified in the vicinity of locus 19, which was tagged for [N] in both environments and by Steketee et al. (2019).This includes a gene model (Glyma.16G156700)identified by keyword search that encodes a SCARECROW (SCR) homologue.In several legumes, including soybean, SCR is expressed in root cortical cells, and, in Medicago truncatula, nodule number and density are reduced in SCR mutants (Dong et al., 2020).Based on allelic comparisons at locus 19, several transporters, as well as two transcription factors were included in the candidate gene list (Table 3 and Supplementary Table S8).This included a GATA transcription factor (Glyma.16g155300), numerous of which have been shown to be responsive to N availability, and two that were implicated to be involved in the regulation of soybean N metabolism (Zhang et al., 2015).An Ammonium transporter 2 (AMT2) homologue (Glyma.02G043700)was identified close to the SNP that anchored the novel [N] locus 2. In Arabidopsis, AMT2.1 contributes to ammonium uptake in roots and functions mainly in root-to-shoot translocation of ammonium under N deficiency conditions (Giehl et al., 2017).As the N status of soybean is a function of both SNF and uptake of mineral N from the soil, it is not surprising that candidate genes associated with both processes were identified within the LD regions of loci associated with shoot [N].Indeed, it appears that breeding for greater yield has enhanced the ability of soybean to acquire more N from both sources (Donahue et al., 2020).

Carbon nitrogen ratio
The relationships among C assimilation, N uptake, and SNF are closely regulated.Photosynthetic output can be negatively affected due to N deficiency but can recover with N supplements (Coruzzi and Bush, 2001;Coruzzi and Zhou, 2001).On the other hand, increased C supply can improve N uptake and metabolism (Coruzzi and Bush, 2001;Coruzzi and Zhou, 2001).In a hydroponic study, Bacanamwo and Harper (1996) found that the shoot C/N ratio was positively correlated with the activity of nitrogenase in soybean nodules and suggested that regulation of nitrogenase activity entails tissue C as well as N concentrations.The balance between C and N assimilation in soybean also is influenced by the negative impact of drought, which is more severe on SNF than on leaf photosynthesis, and partial recovery from prolonged drought is slower for SNF than for photosynthesis (Djekoun and Planchon, 1991).Significant genotypic variation in the C/N ratio exists in soybean, but, to our knowledge, only one study has identified molecular markers for this trait.Dhanapal et al. (2015b) used GWAS to identify 17 putative C/ N ratio loci in MGIV soybean grown in multiple field environments.Interestingly, none of the 22 loci associated with C/N in the present study overlapped with any of those reported by Dhanapal et al. (2015b) (Table 3 and Supplementary Table S6).The absence of overlap with loci identified by Dhanapal et al. (2015b) was unexpected, particularly in light of the coincidence in loci for shoot [N] with previous studies and the detection of three loci (5, 9, and 19) in both environments of the present study, but likely is due to factors such as complexity of the regulation of C and N uptake and assimilation, particularly under different environmental conditions, as well as the makeup of the diversity panels used in the two studies.Not surprisingly, given the strong correlation between the two traits (Table 2), five C/N-associated loci (1, 6, 11, 19, and 21) colocalized with [N]-associated loci in the current study.Naturally, this also led to an overlap in gene models of interest for locus 19 (the only locus among these five for which candidate genes were identified).Among the candidate genes identified at other C/N loci, Glyma.08G118500encoding C-terminally Encoded Peptide 14 (CEP14) was identified based both on keyword search and on the allelic combination approach.The anchoring marker of locus 10 is located within the coding region of CEP14, which also is within about 600 Kb of the d 15 N locus 7 and [N] locus 12.In Arabidopsis, under N-limitation, CEP1 serves as a root-to-shoot signal that is perceived by CEP receptors in the shoot which leads to shoot-toroot signaling that causes an upregulation of nitrate transporter genes in roots (Tabata et al., 2014).CEPs are also involved in the regulation of nodule numbers in legumes, where CEPs positively regulate nodulation (Imin et al., 2013;Laffont et al., 2020).Several CEPs, including CEP14, are regulated by environmental cues such as N depletion and increased CO 2 levels in shoots and roots of Arabidopsis (Delay et al., 2013), as was observed for CEP1 in Medicago (Imin et al., 2013).Further studies are required to establish whether CEP14 is causal for the association of locus 10 with C/N, but its involvement in the signaling of N conditions and responsiveness to CO 2 documented in the literature underscores its potential as a target to develop a better understanding of the regulation of C and N status in soybean.

Conclusion
Genomwide association analysis was conducted for d 13 C, d 15 N, [N], and C/N with 105 genotypes and a high-density marker panel consisting of more than 4.1 million SNPs derived through wholegenome resequencing.A total of 76 genomic loci associated with these traits were detected, namely, 11 for d 13 C, 22 for d 15 N, 21 for [N], and 22 for C/N.Nine loci associated with different traits were consistent across the two environments.In addition, several loci were associated with more than one trait, including four loci for [N] and C/N, and one locus for d 15 N and C/N and, consequently, may serve as markers for multiple traits.Although most identified loci were novel, 14 aligned with previously detected genomic regions for the same or similar traits.Exploration of the gene models in the vicinity of the loci identified 59 candidate genes.Further scrutiny of these candidate genes will facilitate a better understanding of WUE, SNF, and N and C dynamics.The genomic regions and candidate genes identified in this study may serve to breed for soybean with enhanced WUE and improved and sustained SNF and to further dissect the mechanisms controlling WUE and SNF.
).To declare a marker-trait association significant, markers had to either pass (1) a -log (p) value threshold determined by the FarmCPU model[-log (p)  values ranged from 4.11 to 5.40; Supplementary Table

FIGURE 3
FIGURE 3Principal component analyses plots showing the first two principal components separated the whole germplasm panel into eight subpopulations.

TABLE 1
Descriptive statistics and heritability of the traits in two environments, 2014 and 2016.

TABLE 2
Pearson correlation coefficients between the traits in two environments and across the environments (AEs).
each trait for each year using the FarmCPU model (Figures4, 5

TABLE 3
Significant markers and loci associated with carbon istope ratio (d13C), nitrogen isotope ratio (d15N), N content and C/N ratio in two environments, 2014 and 2016, after stepwise regression.
Bold * Loci that were detected in both years.jCandidate genes based on allelic combination.yCandidate gene identified by both "keyword search" and "allelic combination.+Significant allele combination score in both environment.A