Genetic Relationships Among Physiological Processes, Phenology, and Grain Yield Offer an Insight Into the Development of New Cultivars in Soybean (Glycine max L. Merr)

Soybean grain yield has steadily increased during the last century because of enhanced cultivars and better agronomic practices. Increases in the total biomass, shorter cultivars, late maturity, and extended seed-filling period are frequently reported as main contributors for better soybean performance. However, there are still processes associated with crop physiology to be improved. From the theoretical standpoint, yield is the product of efficiency of light interception (Ei), radiation use efficiency (RUE), and harvest index (HI). The relative contribution of these three parameters on the final grain yield (GY), their interrelation with other phenological–physiological traits, and their environmental stability have not been well established for soybean. In this study, we determined the additive–genetic relationship among 14 physiological and phenological traits including photosynthesis (A) and intrinsic water use efficiency (iWUE) in a panel of 383 soybean recombinant inbred lines (RILs) through direct (path analyses) and indirect learning methods [least absolute shrinkage and selection operator (LASSO) algorithm]. We evaluated the stability of Ei, RUE, and HI through the slope from the Finley and Wilkinson joint regression and the genetic correlation between traits evaluated in different environments. Results indicate that both supervised and unsupervised methods effectively establish the main relationships underlying changes in Ei, RUE, HI, and GY. Variations in the average growth rate of canopy coverage for the first 40 days after planting (AGR40) explain most of the changes in Ei. RUE is primarily influenced by phenological traits of reproductive length (RL) and seed-filling (SFL) as well as iWUE, light extinction coefficient (K), and A. HI showed a strong relationship with A, AGR40, SFL, and RL. According to the path analysis, an increase in one standard unit of HI promotes changes in 0.5 standard units of GY, while changes in the same standard unit of RUE and Ei produce increases on GY of 0.20 and 0.19 standard units, respectively. RUE, Ei, and HI exhibited better environmental stability than GY, although changes associated with year and location showed a moderate effect in Ei and RUE, respectively. This study brings insight into a group of traits involving A, iWUE, and RL to be prioritized during the breeding process for high-yielding cultivars.


INTRODUCTION
Through the combined contribution of breeding, agronomy, and climate change, soybean yield has achieved a dramatic improvement. A steady yield increase of 24.7 kg ha −1 year −1 USDA-NASS, 2020) has almost quintupled productivity compared with the 740 kg ha −1 produced in 1924. Retrospective studies showed that breeding and agronomy have effectively contributed to a relatively similar percentage to the soybean yield improvement during the last decades (Specht and Williams, 1984;Specht et al., 1999). Contribution of increased CO 2 , also called carbon fertilization, is based on the stimuli in the net carbon fixation in species C3 via better control of photorespiration (Specht et al., 1999;Ainsworth et al., 2012;Taiz et al., 2014). Variation on productivity as a result of CO 2 increase has been estimated in a wide interval from 4.3 to 32.0% with a likely contribution in the range of 5-10% (Specht et al., 1999;Ainsworth et al., 2012;Sakurai et al., 2014).
Through changes guided by genetic, breeding, and market, soybean went from being considered a forage crop using plant introduction from East Asia in the early 1900s to the adoption of bred cultivars with better adaptation to North America in 1940 (Hartwig, 1973;Rincker et al., 2014). Selection for yield was the first target and later complemented with pest resistance, while proprietary breeding programs joined public efforts at the level of currently providing most of the soybean seed required for farmers in North America (Carter et al., 2004;Specht et al., 2014). Breeding strategies have focused on optimizing plant structure and seed composition. New cultivars are frequently shorter, are less prone to lodging and shattering, mature later, and also produce more branches and more pods from these branches especially under low density (Specht and Williams, 1984;Carter et al., 2004;Evans and Sadler, 2008;Fox et al., 2013;Rincker et al., 2014;Suhre et al., 2014). Improvements in canopy along with an extended seed-filling length led to greater solar radiation capture during this developmental stage (Boerma and Ashley, 1988;Kumudini et al., 2001;Koester et al., 2014). Augmented total dry matter production has contributed heavily in better yielding regardless of the mixed reports about increased or constant dry matter partition to the seeds (Kumudini et al., 2001;Rowntree et al., 2014;Balboa et al., 2018). Modern cultivars also incorporated resistance to pest and disease reducing potential losses (Johnson, 1987;Heatherly and Elmore, 2004;De Bruin and Pedersen, 2009). Breeding achievements also involved transgenic soybean and resistance to glyphosate, which, since 1996, changed the weed control, making it more flexible, simpler, and opportune (Reddy, 2001). Seed composition and yield components have been optimized to meet new requirements for industry and human health (Morrison et al., 2000;Ustun et al., 2001). While protein concentration was reduced, oil concentration and oil composition were increased, favoring monounsaturated fat acids (oleic) (Wilcox et al., 1979;Morrison et al., 2000;Ustun et al., 2001;Giannakas and Yiannaka, 2004;Rowntree et al., 2013;Rincker et al., 2014). Increase in seed weight is not always consistent or, if positive, less than 0.10 g per 100 seeds, suggesting bigger contribution to increased yield from more seeds per plant, or more plant per hectare (Specht and Williams, 1984;Voldeng et al., 1997;Morrison et al., 2000;Wilson et al., 2014).
Theoretical calculations indicate soybean grain yield potential is around 8,000 kg ha −1 (Specht et al., 1999). However, the current yield is still quite far from this potential with 3,409 kg ha −1 in 2020 USDA-NASS, 2020). Although closing the gap is a common effort involving not only plant breeding but also better agronomic practices, a clear identification of factors or traits to be prioritized must be carried out to concentrate efforts and resources. From the physiological standpoint, potential grain yield is the product of efficiencies accounting for the capture and transformation of solar radiation into biomass abbreviated, respectively, as Ei and RUE, and the later efficiency of allocation of dry matter to the economically important organs or HI (Monteith, 1972(Monteith, , 1977. In soybean, although studies involving one or more of these three efficiencies are available with particular focus on HI (Shibles and Weber, 1966;Spaeth et al., 1984;Board and Harville, 1993;Kumudini et al., 2001;De Bruin and Pedersen, 2009;Fox et al., 2013;Koester et al., 2014;Rowntree et al., 2014;Suhre et al., 2014), the influence of other physiological and phenological variables on Ei, RUE, and HI as well as the interrelation among this three efficiencies and their partial contribution to GY is not documented in soybean.
Determining the relationship among these agronomical, physiological, and phenological variables requires the implementation of multivariate methodologies where genetic and environmental relationships are established. Classical approach to establish interrelation among variables include the supervised path analysis method, where a set of lineal equations are defined based on a correlation matrix and theoretical background (Wright, 1960;Bondari, 1990;Walsh and Lynch, 1998). Path coefficients provide more information than traditional correlations since they not only present the partial contribution of predictors on the response variables but also report direct and indirect effects (Bondari, 1990;Board et al., 1999). Unsupervised machine learning methods offer new alternatives to establish complex interactions among variables through undirected graphical models (Hastie et al., 2009;Steinsland and Jensen, 2010). An example is the Markov network machine learning method, which does not require specificity for direction and is suitable for spatial or relational data for uncovering variable structure and dependence (Murphy, 2014). Previous studies to establish interrelations among agronomical and phenological variables have been already performed, and works through historical panels have also indirectly approached these relationships (Specht et al., 1999;Morrison et al., 2000;Rincker et al., 2014;Suhre et al., 2014;Xavier et al., 2017a). Directed and undirected methods in soybean have been independently reported by Board et al. (1999) and Xavier et al. (2017a), focusing on, in the first case, yield components and, in the second, phenology, canopy development, and yield component. However, these studies lack the inclusion of physiological processes and efficiencies accounting for changes in the potential yield such as Ei, RUE, A, and iWUE. In addition, comparison of result from these two methods in soybean is not reported.
In this study, we established the genetic correlations among agronomical, physiological, and phenological variables and the three efficiencies controlling the potential grain yield in soybean: efficiency of light interception, radiation use efficiency, and harvest index (Monteith, 1972(Monteith, , 1977. Likewise, we determined the relative contribution of Ei, RUE, HI, and other physiological variables as A, and iWUE to the GY in soybean through direct (path analysis) and undirected graphical model [least absolute shrinkage and selection operator (LASSO) algorithm] methodologies based on additive-genetic variance-covariance matrices. Finally, we evaluated the stability of Ei, RUE, and HI using the genetic correlations between the same trait evaluated in a different environment and the slope from the Finlay and Wilkinson joint regression (FWR). This paper suggests traits to be prioritized during the breeding process as a strategy to improve the grain yield in soybean.

Plant Material and Experimental Design
A maturity-controlled panel of 383 recombinant inbred lines (RILs) selected from the Soybean Nested Association Mapping collection SoyNAM was used. A panel was selected constraining maturity and evaluated as days required to get R8 (Fehr and Caviness, 1977), while retaining variation to GY (Supplementary Figure 1). Thus, the maturity group for these indeterminate RILs was similar and considered as group 3 MG III. RIL selection was performed using as data set field data collected during the seasons 2011 and 2014 in Indiana and Illinois. These 383 RILs come from 32 families classified into three main classes according to the type of cross originally made: high yielding (HY), high yielding under drought conditions (HYD), and diverse ancestry (DA). A complete description of families, crosses, and extra information is available in 1 and Lopez et al. (2019), while the complete list of families and RILs is presented in Supplementary Tables 1

Phenotypic Traits
A fixed-wing UAV-type eBee equipped with an S.O.D.A red-green-blue (RGB) camera (senseFly Parrot Group, Switzerland) was flown with a frequency of ∼12 days. Canopy coverage (CC) was obtained from the RGB imagery through the software Progeny R (Progeny Drone Inc., West Lafayette, IN) using a multilayer mosaic approach as described by Frontiers in Plant Science | www.frontiersin.org Hearst (2019). Aboveground dry matter for each plot was sampled during the growing season in a linear section of 0.56 m in a row with perfect competence. The fresh biomass collected from each sampling site was dried at 80 • C using a dry air system until constant weight. Three full biomass samplings (∼38, 58, and 84 days after planting -DAP) were considered for both environments in 2018, while just one sampling when maximum biomass accumulation was achieved at 91 DAP was carried out in 2017. Biomass was adjusted through a linear model involving RIL, environment, and replication as variables and number of plants as a covariate to avoid potential differences in biomass due to the number of plants. Seed weight was directly calculated from a lineal sample size of 0.56 m harvested and threshed at maturity (R8) (Fehr and Caviness, 1977). Ei was calculated as the simple ratio between the solar radiation intercepted by the canopy and the total photosynthetic active radiation (PAR) available. To determine the daily solar radiation intercepted, a series of 766 logistic models, one per plot, were fitted following equation (1) through the R software (R Core team, 2019) package "growthrates" (Petzoldt, 2018). We used the CC as a proxy for light interception considering the direct relationship between these two parameters previously documented by Purcell (2000) and Xavier et al. (2017b).
where y is canopy coverage, yo is the minimum canopy coverage value measured, k corresponds to the maximum canopy value or load capacity, µmax is the maximum relative growth rate, and time indicates days after planting. Radiation use efficiency in 2018 environments was calculated as the slope of a linear regression between the total dry matter aboveground and the cumulative PAR intercepted. In 2017, since only one biomass sampling was performed, a simple ratio between the total aboveground dry matter and the cumulative PAR intercepted was used. Apparent HI was calculated as the direct ratio between the seed weight (0% moisture) and the total aboveground dry matter. Grain yield was determined in two perfect competence rows from each plot through a mechanical harvest. The weight registered was adjusted to 13% seed moisture and extrapolated to the hectare. Phenological stages R1, R5, and R8 corresponding to days required to achieve flowering, beginning of seed, and maturity were scored three times per week following the criteria presented by Fehr et al. (1971). Length of the reproductive period was obtained by subtracting days to R8 to days to R1, while seed-filling length was calculated as R7 minus R5 in days.
AGR40 was measured as the mean of the daily growth rate during the first 40 days after planting. Growth rate corresponds to the first derivative from each logistic model adjusted for CC. Photosynthesis and intrinsic water use efficiency were measured through a portable photosynthesis system (LI-COR 6400XT, LI-COR, Lincoln, NE) set with a PAR value of 1,600 µmol photons m −2 s −1 . CO 2 concentration, temperature, and relative humidity were controlled to be 400 µmol mol −1 , 25 • C, and 75 ± 10%, respectively. The gas exchange parameters were measured before the seed filling phenological period, from late R4 and early R5 (Fehr and Caviness, 1977), in the third uppermost fully developed leaf, in three representative plants from each experimental unit from a complete replication. Additional details about the gas exchange protocol and measurements are available in Lopez et al. (2019).
Maximum leaf area index was recorded in a single measurement when the full canopy was achieved (60-70 DAP). A portable canopy analyzer (LI-2200, LI-COR, Lincoln, NE) following the protocol for small plots in row crops suggested by LICOR (LI-COR Inc, 2012) was used. Light extinction coefficient (K) was calculated through the light attenuation within a canopy theory reported by Monsi and Saeki (1953). Maximum LAI along with light measurements above and below the canopy was considered following equation (2): where I is the photosynthetic photon flux density (PPFD) measured on a horizontal plane, LAI is the leaf area index cumulated from top of the canopy, and K is the extinction coefficient. I 0 is the PPFD above the canopy.

Genetic Correlations
Best linear unbiased estimator (BLUE) per environment was calculated through a mixed model approach through the "lme4" package (Bates et al., 2015) in the software R following the statistical model below: where Y is the vector of phenotypes measured in the i th replication, into the j th block for the k th RIL. µ is the intercept, f (x) controls the spatial heterogeneity within replications, α accounts for the effect of replication, αβ corresponds to the interaction replication × block, δ accounts for the genetic effect, and e controls the error. The covariate f (x) was computed as the average phenotypic value from the four closer surrounding plots (Lado et al., 2013) through the function NNsrc from the R "NAM" package (Xavier et al., 2015). In this model, the spatial covariate and the RILs were treated as fixed effects, while the other sources of variation were considered as random with any random effect r ∼ N(0, σ 2 r ), and e ∼ MVN(0, R). BLUEs standardized by environment for all the traits were used to fit a second mixed model in a multivariate approach through the function reml in the "NAM" R package (Xavier et al., 2015). Additive-genetic effects were accounted for in this second model through a kinship matrix generated from a set of 23,119 single-nucleotide polymorphisms (SNPs) (Lopez et al., 2019). From this multivariate mixed model, two variance-covariance matrices were produced, G and R, where G corresponds to the additive-genetic matrix and R (residual) resembles the environmental relationships since BLUE values were used as input data. Correlations were calculated following the standard formula using the covariance between traits as the numerator and the product of their standard deviation as the denominator.

Path Analysis, Unsupervised Model, and Environmental Trait Stability
A path analysis using the additive-genetic correlation derived from the G matrix was carried out to calculate the standardized path coefficients through the R package latent variable analysis "lavaan" (Rosseel, 2012) followed by a graphical representation through the R package "semplot" (Epskamp et al., 2019). Likewise, we implemented an undirected graphical model based on the same G matrix to establish the connection among traits. A Gaussian undirected graphical model based on neighborhood selection with the LASSO algorithm (Meinshausen and Bühlmann, 2006) implemented in the R package "huge" (Zhao et al., 2012). Finally, environmental stability for Ei, RUE, and HI was evaluated through two methodologies: (1) as the additive-genetic correlation between the same traits measured in the three different environments and (2) through the slope of the FWR (Finlay and Wilkinson, 1963). The Kendall correlation is used rather than the Pearson correlation, since Kendall assesses statistical association based on ranking (Kendall, 1938); thus, a positive correlation means that when the rank of certain trait evaluated in one environment increases, the rank of the same trait evaluated in another environment also increases. Kendall correlations were evaluated using the software R following formula (4): where τ indicates the Kendall correlation and n is the number of observations. Where FWRs were implemented through the "FW" package in R under a Bayesian approach (Lian and de los Campos, 2016;Kusmec et al., 2017;Vanous et al., 2019). A nongenomic relationship matrix was used during the implementation; then Ad = I, where I is the identity matrix. RILs with missing information for one or more environments were discarded for this analysis. Slopes from FWR assess stability using the phenotypic values corrected by replication and incomplete block as input where all the genetic effects are presented, whereas correlations use the breeding values where only additive genetic effects are considered.

RESULTS
High positive additive-genetic correlations were identified for Ei with AGR40 and K, contrasting with a negative correlation found between Ei and R8, SFL, and RUE ( Table 2). Narrow-sense heritability for Ei reported a value of 0.65. Harvest index was positively correlated with GY, A, R8, and RL, while negatively correlated with R5. HI heritability was similar to Ei with 0.68. RUE, in turn, showed a moderated additive correlation with RL, R1, K, and AGR40, while its heritability was calculated to be 0.36. GY was positively associated with RL, R8, and A, while negatively correlated with R1 and R5. Narrow-sense heritability for GY corresponded to 0.82. Other high genetic correlations include AGR40 with K, RL with R1, and R8 ( Table 2). The descriptive statistics of mean, maximum, and minimum for the traits here considered are reported in Table 3.
The efficiency of light interception is mainly determined by the average canopy coverage growth rate during the first 40 days of the growing season with a path coefficient of 0.86 ( Figure 1A). Other variables influencing Ei include days to R1 and K with path coefficients of 0.07 and 0.12, respectively. AGR40 along with LAI control K showing path coefficients of 0.59 and 0.19. RUE is positively influenced by days to seed beginning,   intrinsic water use efficiency, reproductive length, light extinction coefficient, and photosynthesis ( Figure 1B). Path coefficients for these associations varied from 0.73 to 0.13 with high values for R5, iWUE, and RL primarily. An increase of one standard unit of R5 or RL augments 0.73 and 0.56 standard units of RUE, respectively. In contrast, LAI and AGR40 negatively influence RUE with reduction of −0.30 and −0.22 standard units in RUE when one standard unit of LAI or AGR40 is increased, respectively. The average canopy coverage growth rate during the first 40 DAP also showed a positive effect in HI with a coefficient of 0.33. Apparent harvest index is highly influenced by photosynthesis, length of seed-filling period, average canopy coverage growth rate during the first 40 days, reproductive length, and days to R5 (Figure 1C). All these variables are positively FIGURE 2 | Undirected model through the LASSO algorithm for additive-genetic relationship among physiological and phenological traits with light interception efficiency (Ei), radiation use efficiency (RUE), harvest index (HI), and grain yield (GY ) in a maturity-controlled panel of soybean. Three hundred eighty-three recombinant inbred lines (RILs) evaluated in three environments. A, photosynthesis; AGR40, average canopy coverage growth rate during the first 40 DAP; K, light extinction coefficient; iWUE, intrinsic water use efficiency; LAI, leaf area index; R1, days to flowering; R5, days to beginning of seed formation; R8, days to maturity; RL, reproductive period length; SFL, seed-filling length; LASSO, least absolute shrinkage and selection operator. related to HI except by R5 with a negative path coefficient of 0.16. Photosynthesis presented the highest path coefficient for HI with 0.57; thus, an increase in one standard unit of A would produce a positive change in 0.57 standard units of HI. SFL and AGR40 also positively contribute to HI, where a change of one standard unit of either SFL or AGR40 produces an augment of 0.33 standard units on HI. The lowest path coefficient was observed for RL with 0.21. Grain yield was positively associated with harvest index, radiation use efficiency, and light interception efficiency with path coefficients of 0.50, 0.20, and 0.19, respectively. Thus, a change in one standard unit of HI promotes an increase in 0.50 standard units in GY. Contrarily, days to flowering negatively influenced the grain yield in soybean, showing a path coefficient of 0.37 ( Figure 1D). Trends in the general model were kept, with A and RL influencing HI and AGR40 explaining changes in Ei; while RL, R5, iWUE, and LAI were the main variables affecting RUE.
The undirected model (Figure 2) showed a straight influence of RL, R1, and HI in final GY, while HI is directly associated with photosynthesis. This diagram also depicts the relationship between RL, R1, and R8, with the last phenological stage connected to a node mainly associated with light interception through the variables SFL, Ei, AGR40, and K. RUE, LAI, and iWUE were not clustered with other traits through this undirected methodology. Finally, environmental stability was high for harvest index with Kendall ranking correlation ranging from 0.54 to 0.77 (Table 4). Light interception efficiency showed a high correlation for the locations evaluated during 2018 with a value of 0.82 but a limited correlation when we compared 2017 and 2018 environments. Radiation use efficiency, in turn, presented a moderate correlation when compared with environments from the same location through the years but poor stability between different locations in different years.
When the stability was assessed through the slopes from the joint regression including not only an additive-genetic effect but also epistasis and the reduced dominance remaining, we found a moderate-to-high stability for Ei, RUE, and HI with distributions centered at 1.0 and narrow interquartile range (IQR) of 0.48, 0.02, and 0.09, respectively (Figure 3). Grain yield, in turn, showed medium-to-low stability with minimum and maximum values of −1.3 and 3.6 and IQR of 1.1.

DISCUSSION
Path analysis is a multivariate methodology closely related to multivariate regression where the path coefficients correspond to standardized regression coefficients for the linear model suggested by the path diagram (Walsh and Lynch, 1998). The efficiency of light interception is directly affected by canopy architecture and function (Bai et al., 2016;Chavarria et al., 2017). Changes in one standard unit of AGR40 are associated with changes in 0.86 standard units of Ei. Our results indicate that the efficiency of light interception is mainly a function of how fast the canopy develops during the first stages rather than the maximum LAI achieved. This results also points out the importance of agronomic decisions affecting early canopy development such as distance between plants, distance between rows, and planting date (Shibles and Weber, 1966;Westgate et al., 1997;Andrade et al., 2002;Edwards et al., 2005) as viable strategies to maximize light interception. Additionally, because of the indirect relationship of AGR40 and GY in the integrate path diagram, it is suggested that capitalizing in early light captured not only increased Ei but also might improve grain yield (Figure 1D). The positive effect of canopy coverage rate on grain yield is in accordance with previous reports in soybean and corn (Luque et al., 2006;Xavier et al., 2017a). Light extinction coefficient also influenced Ei, since according to equation (2), K directly participates in the determination of the amount of solar light remaining after passing through layers of LAI (de Wit, 1965;Impens and Lemeur, 1969;Wang et al., 2007;Zhang et al., 2014). Therefore, greater K averages suggest planophile canopies with higher light attenuation, while solar radiation passes through the leaves. However, high K may also imply less light interception in the lower third of the canopy and probably less canopy photosynthesis as demonstrated by Chen et al. (1994), who showed that upright leaves produced up to 25% higher canopy photosynthesis compared with planophile canopies. Our results are also coherent with the previous finding of Duursma et al. (2012) who described light interception through a simple model involving crown density and leaf dispersion, two variables analogous to canopy coverage and light extinction coefficient. LAI plays an indirect role in Ei through its influence in K that is explained by the multiplicative effect of LAI and K in equation (2). Thus, greater LAI augments the number of layers that light must pass through, increasing the likelihood of solar radiation trapped by the leaves.
Radiation use efficiency is considered the physiological trait that will be the focus for new increases in grain yield to bridge the gap between current and potential values (Melis, 2009;Payne et al., 2012;Reynolds et al., 2012). RUE indicates the capability to transform solar radiation, a free resource, into biomass through the plant metabolism. Our results indicate that this efficiency is mainly associated with phenological traits. Longer reproductive length has been previously associated with higher grain yield in soybean (Xavier et al., 2017a), which, along with extended R5, would allow to create stronger sources with extra photosynthates to later being translocated to pods and grains (Board and Harville, 1993;Board and Tan, 1995;Board and Kahlon, 2011). Intrinsic water use efficiency, even more than photosynthesis (∼4-fold), was also positively associated with RUE, indicating that high photosynthetic rates alone are not enough to produce high biomass per unit of light intercepted. High iWUE can reduce the loss of carbon fixation under short water deprivation events (Blankenagel et al., 2018), limiting the RUE decrease. In soybean, iWUE demonstrated independent variation for both photosynthesis and stomatal conductance with variation mainly attributed to changes in stomatal conductance rather than photosynthesis (Gilbert et al., 2011). Reduction in the seasonal RUE and GY in soybean is reported as a consequence of water stress during the pod initiation and seed filling (De Costa and Shanmugathasan, 2002;Adeboye et al., 2016). When water deprivation occurs, crop growth rate and dry matter production are reduced as a consequence of a net assimilation decrease mediated by the lack of CO 2 coming into the leaf (Board and Kahlon, 2011). Likewise, increased daily saturation vapor pressure deficit, a key variable controlling transpiration, is reported as a factor for reducing RUE in sorghum and maize even under wellwatered conditions (Stockle and Kiniry, 1990). The importance of considering water dynamic in conjunction with carbon metabolism is also pointed out by Wu et al. (2019), who conclude that the impact of enhancing photosynthesis on yield is strongly dependent on the degree of water limitation. These authors suggest modeling the photosynthesis-stomatal conductance relationship as a key factor to better quantify theoretical impacts of improving photosynthesis. The influence of AGR40 and LAI is explained by their direct and indirect contribution to the cumulative light intercepted (Figure 1A), which corresponds to the denominator of RUE. These negative associations are also a consequence of the nonlinear relationship between light intercepted and biomass produced when larger amounts of light are intercepted (Edwards et al., 2005). Thus, under greater LAI that is likely promoted by high AGR40, the soybean cannot maintain a constant rate of biomass production per new amount of light intercepted, diminishing the overall RUE. The asymptotic effect of 90% of total biomass in soybean was reported by Edwards et al. (2005), suggesting that any extra light intercepted above 911 MJ m −2 would produce a marginal augment of up to 10% in total biomass, with even increases of just 5% when PAR intercepted changed from 911 to 1142 MJ m −2 . Reduction in LAI might contribute to enhance RUE in soybean, and its feasibility is not completely discarded since it has been demonstrated that 1/3 defoliation does not affect yield and quality as long as LAI is above 3.0 (Board and Harville, 1993;Liu et al., 2008).
Harvest index is an indication of reproductive effort with a large contribution on grain yield achievements in cereals during the last decades, especially after the "green revolution" (Donald and Hamblin, 1976;Hay, 1995;Evans et al., 1999;Sadras and Lawson, 2011). Our results show a strong relationship between HI and A, SFL, and RL, with A being the most remarkable, with increases in one standard unit of A promoting changes in 0.57 standard units in HI. Augmented HI in soybean seems to be a priority to improve grain yield. Achieving this challenge can be made using different approaches, including semi-determinate cultivars through introgression of genes DT1 and DT2, which are presented as promising high-yielding materials with less aboveground vegetative biomass (Kato et al., 2019). Increased seed weight through a higher photosynthetic rate, an extended seed-filling period, and augmented seed size is also proposed as a viable strategy. Photosynthesis is the main process accounting for carbon fixation and, along with respiration, controls the carbohydrates available for grain filling (Board and Kahlon, 2011;Taiz et al., 2014). Extended filling period along with high CO 2 fixation rates are suggested as synergic events boosting the grain yield formation in soybean. Increased partitioning of carbohydrates is associated with better seed set in soybean (Board and Kahlon, 2011;Rotundo et al., 2012), as the availability of photosynthates during the filling period determines if the seed growing is sink or source limited, with sink limitation occurring when photosynthesis increases and source limitation when photosynthesis is reduced (Egli and Bruening, 2001). The strong contribution to final GY from HI aligns with reports in wheat, where a significant positive correlation between photosynthesis traits, HI, and GY, is documented (Foulkes et al., 2011;Xiao et al., 2012;Carmo-Silva et al., 2017). In soybean, in turn, a recent study showed a high genetic correlation between A and GY (Lopez et al., 2019). The importance of the combination of photosynthesis and duration of the reproductive stage was also demonstrated by Boerma and Ashley (1988), who reported a high correlation of 0.78 between GY and the product canopy apparent photosynthesis by seed filling period. Augmented light interception during early stages in soybean increases both number of nodes and number of pods, with the positive effect not only in HI but also in GY (Board et al., 1992;Board and Tan, 1995). The number of pods per reproductive node was reported as the main yield component in soybean when a path analysis was carried out back in 1999 (Board et al., 1999), whereas a high genetic correlation between early canopy development and GY is reported in soybean (Xavier et al., 2017a,b). Days to R5 showed a negative moderate effect in HI, which is explained by the direct effect of extended R5 in the seed-feeling period considering that the panel we evaluated is maturity-controlled. Progress to increase reproductive length should focus on reducing the time required to flowering since increasing time to maturity involves the logistic problem associated with changes in the maturity group. Phenotypic variation for days to R1 exists since the data set we collected showed a range of variation for R1 from 13 to 20 days being as early as 34 days after planting ( Table 2). Although flowering in soybean is under the control of photoperiod, temperature, irradiance, and eight "E" genes (Hadley et al., 1984;Cober et al., 2014), insensitive genotypes "day neutral" have been identified (Criswell and Hume, 1972;Polson, 1972;Nissly et al., 1981;Shamugasundaram, 1981;Islam et al., 2019), suggesting that cultivars with less sensitivity to photoperiod might be produced with a theoretical positive effect on GY.
Despite that the unsupervised method cannot establish a direction and contribution value for each interaction, the graphical model based on the LASSO algorithm revealed most of the relationship we found through the path analysis. The LASSO method not only minimizes the residual sum of squares but also constrains some coefficients to exactly zero, performing a parallel variable selection (Tibshirani, 1996). Thus, the absence of connection between RUE, iWUE, and LAI with the full graphical model might be a consequence of overall weak correlations for each of these three traits, with most of the other variables ( Table 2) making the algorithm to minimize their contribution to the whole model. In the case of RUE, the lack of clustering can be associated also with the moderate inconsistency on the ranking of RILs among the environments mainly promoted by changes in location (ACRE vs. RMN). These changes in ranking, GxE, found for RUE contrast with the low sensitivity to variations in location and year showed by HI and, along with the low dispersion of the FWR slope, suggest high stability and fewer requirements of multi-environment trials during the HI determination. Light interception coefficients, in turn, are strongly influenced by changes in the canopy among years but highly correlated among locations. Differences among years in this study may be explained by particular responses of lines to planting dates since ACRE_2017 was planted late (May 31) compared with ACRE_2018 (May 22) and RMN_2018 (May 17). A negative effect of late planting in LAI is reported for soybean (Parvez et al., 1989;Tagliapietra et al., 2018) with a detrimental effect in grain yield also (Egli andBruening, 1992, 2000;Boote et al., 1998;Egli and Cornelius, 2009).
Genotype × environment (GxE) stability is a desirable performance when new cultivars are released (Bondari, 2003). In soybean, Xavier et al. (2018) recently reported seven genomic regions located on chromosomes 4, 6, 9, 13, 15, and 18 contributing to the GxE response. Likewise, another single region linked to yield stability on chromosome 18 was also documented. In this study, two high-yielding environments (ACRE_2018 and RMN_2018) characterized by extended light harvesting period through higher number of degree days to harvest, well distributed rainfall, and higher mean temperature during the growing season were presented (Table 1). In contrast, a single low-yielding environment (ACRE_2017) associated with reduced growing season for late planting and lower temperature was classified. Our results present stability evaluated through the slope of the FWR (Finlay and Wilkinson, 1963) revealing better stability for Ei, RUE, and HI than GY per se. High stability for harvest index in determinate and indeterminate soybean evaluated in the south (Gainesville, FL) and north of the United States (Ithaca, NY) is already reported, aligning with our findings (Spaeth et al., 1984). According to the stability classification, the three efficiencies we assessed showed stability type II, meaning the response to the environment is the same as the mean response with the regression's slope equal to 1 (Bernardo, 2002). In this case in particular, type II stability suggests that high adaptation to the environment evaluated aligns with the original goal of the SoyNAM population: "improve the yield potential of soybean varieties" with main focus on the maturity group (MG) III 1 . We observed less stability for GY denoted through the wide distribution of the slopes around the center with 25% (70) of the RILs showing slopes >1.5, suggesting stability type III, better performance than the average in favorable environments but less than average in unfavorable environments (Bernardo, 2002). On the contrary, 32% (89) showed a probable stability type IV for GY with slope < 0.5, implying better than the average response in unfavorable environments but less than the average performance in favorable environments (Bernardo, 2002). Fifty percent (22) of the lines with suggested type III stability come from families classified as high yielding under drought conditions, whereas 26% (28) and 15% (20) have diverse ancestry and high-yielding genetic background, respectively. From the lines with proposed stability type IV, 46% (60) derive from the high-yielding background, 25% (27) come from families with diverse ancestry, and less than 1% (2) come from high yielding under drought. Our results suggest that the material originally bred for environments with water limitations also performs well in favorable environments as observed by Ceccarelli (2015) in barley. In this case, the genetic background for tolerance to water deficit did not impose a penalty to compete in such considerable good environments. Recombinants with high-yielding genetic background respond better to environments considered "unfavorable, " indicating that high-yielding genetic background confers advantages in a wide range of environments.

CONCLUSION
Directed and undirected methodologies are able to capture the main relationships underlying light interception efficiency, radiation use efficiency, harvest index, and grain yield, bringing new insights to strategically approach the breeding of complex traits. Advances in soybean productivity must encompass optimization in phenological and physiological processes where improvement on harvest index appears as a suitable strategy to achieve fast and significant advances in final grain yield. Breeding strategies to increase photosynthesis and water use efficiency are a priority because of their positive impact not only in harvest index but also in radiation use efficiency. Although extending the reproductive period length without affecting the total length cycle would require reducing the photoperiod sensitivity and probably increasing the tolerance to cold temperature during the early stages, this phenological improvement has a potential return in the overall soybean perform involving grain yield, harvest index, and radiation use efficiency. Trait stability for individual efficiencies accounting for grain yield, evaluated through the joint regression's slope, is higher than the stability for grain yield itself, which represent an advantage if selecting for Ei, RUE, or HI was implemented.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at https://www.soybase.org/ SoyNAM/index.php.

AUTHOR CONTRIBUTIONS
ML and KR conceived and designed the experiments. ML and FF collected the field data. ML conducted the data analysis and interpretation, wrote and edited the manuscript. KR coordinated-supervised the research and ensured the funding. All authors contributed to the article and approved the submitted version.
breeding lab at Purdue for their assistance in the field work. Likewise, thanks to the editor and the independent reviewers for their critical analysis of the manuscript. Finally, we also thank the North Central Soybean Research Program (NCSRP) and the United Soybean Board (USB) for funding the development of the Nested Association Panel (SoyNAM).

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2021. 651241/full#supplementary-material Supplementary Figure 1 | Comparative for yield and maturity between the phenology-controlled panel and the full Soy-NAM panel.
Supplementary Table 1 | Name, class, and program of origin for the 32 families evaluated.