Benchmarking Nutraceutical Soybean Composition Relative to Protein and Oil

The aim of this study was to explore relationships between protein, oil, and seed weight with seed nutraceutical composition, focused on total isoflavone (TI) and total tocopherol (TT) contents across genotypic and environmental combinations in soybean. We conducted a synthesis-analysis of peer-reviewed published field studies reporting TI, TT, protein, oil, and seed weight (n = 1,908). The main outcomes from this synthesis-analysis were: (i) relationship of TI-to-protein concentration was positive, though for the upper boundary, TI decreases with increases in protein; (ii) relationship of TT-to-oil concentration was positive, but inconsistent when oil was expressed in mg per seed; and (iii) as seed weight increased, TI accumulation was less than proportional relative to protein concentration and TT decreased more proportional relative to oil concentration. Association between nutraceuticals and protein, oil, and seed weight for soybean reported in the present study can be used as a foundational knowledge for soybean breeding programs interested on predicting and selecting enhanced meal isoflavone and/or oil tocopherol contents.


INTRODUCTION
Seed protein and oil in soybeans [Glycine max (L.) Merrill] define the overall quality value for the international trade markets (1). Additionally, more recently, soybean has ranked as one of the top sources of highly valuable nutraceutical compounds with health-enhancing properties (2). These active substances extracted from plants origin (phytocomplexes), such as soybean seeds, are important given their proven efficacy and benefits on human health (for prevention or support treatment of some pathologic scenarios) in addition to their nutritional content (3)(4)(5). For such relevant properties, these compounds are usually referred to as nutraceuticals (6). From the nutraceutical seed components, isoflavones (minor components of meal) plays a key role in the prevention and treatment of chronic diseases (e.g., cancer, heart disease, osteoporosis) (7) due to its anti-estrogenic and antioxidant activities (8). Several health benefits have been associated with tocopherols, minor components of oil with lipophilic antioxidant properties, playing a critical role in delaying the pathogenesis of cardiovascular and neurodegenerative diseases (e.g., Alzheimer's and Parkinson's) (9).
Seed size defined by weight of seeds, is an important character for different type of soybean foods. Protein, oil, isoflavones, and tocopherols deposition patterns mimic seed dry mass accumulation during seed filling period in soybean (1). Thus, final seed weight may act as an integrative indicator of ecophysiological processes occurring during this period when these seed components accumulate. However, association of nutraceuticals with seed weight has not been comprehensively analyzed. Those relationships may be tightly linked to the particular genotypic and environmental combinations explored in each study, making difficult to assess the biological limits of nutraceuticals composition relative to protein and oil. Thus, a synthesis-analysis, by quantitatively combining results from different studies, may overcome those limitations, expanding the level of inference as well as increasing the validity of conclusions (31,32). A similar approach has been previously explored for analyzing relationships between grain yield and a given resource such as water availability (33) or nutrient (N, P, and K) (34) in wheat or in soybean (35)(36)(37).
Several review studies have been published to synthesize knowledge on soybean seed composition (6,(38)(39)(40)(41). However, until the present a more comprehensive characterization of the variation for seed nutraceuticals with relevant agronomic traits under diverse genotypic and environmental combinations is still lacking. Therefore, the aim of this study was to explore relationships between protein, oil, and seed weight with seed nutraceutical composition, focused on total isoflavone (TI) and total tocopherol (TT) contents across different genotypic and environmental combinations in soybean.

Database Compilation and Variables Evaluated
The literature search for the database compilation was conducted by surveying peer-reviewed journal articles published over the past decades following the procedures described in previous review papers (35,42). Briefly, collected studies were evaluated under field conditions reporting total isoflavone (TI) and total tocopherol (TT) contents (expressed as mg and µg component per seed in dry basis, respectively), protein and oil concentrations (expressed as percentage of dry weight) and/or contents (mg component per seed in dry basis) as well. Seed protein and Abbreviations: TI, total isoflavone; TT, total tocopherol. oil concentrations from all studies were adjusted to a standard moisture basis of 130 g kg −1 , because wet basis moisture content is generally used for industrial and commercial purposes (43). Only two studies were meeting the criteria for being included in this review i.e., that evaluated the four above-mentioned seed chemical traits (20,22), yielding insufficient data to address the objectives of our work. Thus, two databases were generated, one comprising protein and TI, involving 12 studies (Table 1) from Argentina (3), Brazil (2), Canada (4), Korea (1), and the United States (US) (2) and the second database comprising oil and TT, and involving 5 studies ( Table 2) from Argentina (1), China (1), and US (3). Since not all studies reported simultaneously the seed weight, the number of cases for protein and oil concentrations and contents in each database were different. For instance, database 1 includes 1,624 data points for protein concentration and total isoflavone, and 1,600 data points for protein content. Database 2 includes 284 data points for oil concentration and total tocopherol, and 255 data points for oil content. In all cases, the databases included studies primarily focused on quantifying soybean seed composition as affected by G × E combinations. Data were retrieved directly from tables or digitalized from figures.

Database Analysis
Total isoflavone and total tocopherol contents, as well as protein and oil concentrations and contents from both databases were analyzed using descriptive statistics: number of observations (n), mean, standard deviation (SD), median, range of variation (minimum and maximum), quartile 25% as well as 75%, i.e., interquartile range (IQR), with 50% of all observations centered around the median. Analyses were complemented with histograms to visualize the distributions of seed components (GraphPad Prism version 7.0). Seed total isoflavone-to-protein and seed total tocopherol-to-oil relationships were modeled using quantile regression techniques (55,56). Envelopes portraying maximum (0.99 quantile) and minimum (0.01 quantile) boundaries, and the envelopes enclosing 50% of all the observed data, namely the IQR as described in Salvagiotti et al. (57) were obtained using Blossom software (58). In addition, total isoflavone-to-protein and total tocopherol-to-oil ratios for the boundary functions were further studied via linear and quadratic regressions testing seed weight (mg per seed) as an independent variable for explaining changes in soybean seed composition variation. Model selection was based on the residue analysis and the coefficient of determination (R 2 ) (59). In case of significant quadratic regressions, the first derivative of the functions that indicates the seed weight at which a trait accumulation rates is zero was analyzed.

Variation of Soybean Seed Protein, Total Isoflavone, Oil, and Total Tocopherol
Overall mean seed protein concentration, content and total isoflavone were 36%, 87 mg seed −1 , and 1.76 mg seed −1 , respectively ( Table 3). Differences between maximum and minimum values were 72 and 366% for protein concentration    Figures 1A,B). Total isoflavone was more normally distributed close to a mesokurtic distribution (kurtosis = −0.4) (Figure 1C). Distribution for oil and total tocopherol (i.e. database 2) concentration were slightly skewed (skew = −0.11, −0.73, respectively), while oil content was normally distributed but all three variables exhibiting a slightly flat distribution (kurtosis = −0.58, −0.23, 0.18, for oil concentration, content and total tocopherol, respectively Figures 1D-F).

Nutraceutical Soybean Composition Relative to Protein and Oil
For the relationships between seed nutraceutical with protein and oil, the slopes of the linear regressions for percentiles 1 and 99 were 0.005 and 0.10 mg TI seed −1 % protein −1 (Figure 2A), 0.004 and 0.05 mg TI seed −1 mg protein seed −1 (Figure 2B), and 10.0 and 22.6 µg TT seed −1 % oil −1 (Figure 2C), 4.2 and 15.0 µg TT seed −1 mg oil seed −1 (Figure 2D), respectively. Thus, the upper boundary lines represent maximum seed protein (Figures 2A,B) or oil (Figures 2C,D), implying that TI or TT are limited by factors other than protein or oil, respectively. In contrast, the lower boundary lines indicate maximum TI or TT dilution, with protein (Figures 2A,B) or oil (Figures 2C,D) as the main limiting factors for TI or TT, respectively. The distribution of data points in the TI-to-protein concentration relationship (Figure 2A) depicts a positive association up to 35% of protein (13% moisture basis), above which increases in protein concentration seem to be not accompanied by increases in seed TI, plausible portraying the trade-off generally reported between these two traits. Respect to the TT-to-oil concentration relationship, there was not a clear plateauing in seed TT content at high levels of oil concentration (Figure 2C), suggesting that soybean TT accumulation rate per unit of oil remains constant even at high levels of oil concentration (above 20% expressed in 13% moisture basis). However, when the relationships of both protein and oil are expressed in terms of their contents (i.e., in mg of protein or oil per seed), the TI-to-protein relationship presented a larger variation relative to TT-to-oil content. Indeed, in the range of oil content variation (20-61 mg seed −1 ), TT varied 151%. For the range of protein variation (30-138 mg seed −1 ), TI varied 3,358%, with 22% more variability than TT. This greater variability may explain the significant and positive association for TI and protein content (p < 0.0001, Figure 2B), with a non-significant for TT and oil content (p = 0.70, Figure 2D).

Relations of Nutraceuticals, Protein and Oil With Individual Seed Weight
Final individual seed weight is a useful variable integrating physiological responses, with changes in seed weight closely related to variations in seed components. Indeed, across a range of environments, growing seasons, and genotypes, protein and seed weight were positively and linearly related (Figure 3A), with a slope of 0.03% protein mg seed −1 . Interestingly, a quadratic model was fitted for TI content and seed weight (Figure 3B), indicating that TI rate increases less proportionally with increments in seed weight (0.02 mg seed −1 increase in TI up to 91 mg seed −1 then decreases steadily until a seed weight of 258 mg seed −1 ). From a similar range of seed weight, TI is diluted due to a lower deposition (Figure 3B) relative to protein (Figure 3A). On the other hand, responses of both oil concentration and TT content to seed weight were both linear and negative (Figures 3C,D), with a slope of −0.01 % oil and −0.34 µg TT seed −1 mg seed −1 , respectively. Thus, although both traits decreases with increasing values of seed weight, TT decreases at a lower rate than oil, with both traits exhibiting dilution with large seed weights.

DISCUSSION
Broadly variation in soybean seed nutraceuticals relative to protein and oil components revealed that negative associations might not imply a detrimental effect on the synthesis of those compounds. The latter suggests that in some cases physical or genetic limitations at seed level may exist, altering the seed deposition rate, plausible playing a key role for understanding the physiological basis of the directions of the interrelationships between seed traits. The main goal of the current synthesisanalysis was to provide a more holistic interpretation of soybean seed nutraceuticals linked to agronomically relevant seed composition traits (e.g., oil, protein) under the umbrella of broad genotypic and environmental combinations. Although several studies presented associations among different seed components, those associations might be study-specific limited to those conditions with a narrow level of inference. In addition, one of the main roadblocks highlighted by this review is the lack uniformity of reporting units, lack of data on the interrelationships between nutraceutical and agronomically relevant seed traits, and lastly, a lack of proper context for linking changes in concentration without considering the relevancy of individual seed weight. These were the most critical elements needing to receive full attention for researchers, agronomists, breeders, and stakeholders investigating this seed quality topic in the near future.
Overall the present synthesis-analysis showed that TI was positively related to protein concentration, but at protein concentrations above 35% (13% moisture basis) increases in protein seem to be not proportionally accompanied by increments in seed TI. Indeed, Charron et al. (24) only observed a positive correlation of TI with protein in 5 out of 17 soybean cultivars, when protein percentage was below 35%. We found that for protein values above 35% TI increments decreased, highlighting the relevance of a more holistic approach when analyzing relashionships between seed components, since several studies in the literature that reported a trade-off between these traits, analyzed always datasets with protein levels above 35% (23,28,29,63,64). On the other hand, the relationship between TT-to-oil concentration was also positive as previously shown in the literature (65)(66)(67) suggesting an universal relationship between both variables. Only negative relationship for TT-tooil was reported when evaluating low linolenic oil modified genotypes (19), which might responds indirectly to the reported positive correlation between linolenic and TT (68), but not being representative of commercial cultivars.
Previous investigations studied associations between seed traits by making correlation analyses that does not define basis of the associations and potentially implying cause-effect from highly correlated variables (69). In the present study, the boundary analysis by using quantile regression enabled us to obtain more meaningful agronomic and physiological conclusions. Although this analysis has been extensively used for studying relationships between resource availability and seed yield (33,35,57,70), it has not been used for addressing changes in seed composition. For instance, comparing TI-to-protein concentration and TI-toprotein content relationships, the analysis showed that increases in TI accumulation were lower in the second case, suggesting that physical limits may exist for TI increments especially in the range of high protein level. Regarding oil content, the present synthesisanalysis showed non-significant relationship between TT and oil seed content. Nolasco et al. (71) in sunflower (Helianthus annuus L.) found a positive relationship between mentioned traits, but it is important to highlight that oil content in sunflower seeds is 3fold than in soybean (72), and thus, more variation in oil content and a closer relationship between TT and oil content is expected. Then, taking into account the narrow variation of TT across the range of oil, it seems evident that there would be limited opportunity for further increasing TT content in soybean seeds. Several studies have reported variations in soybean seeds isoflavones (10,13,16,29,(73)(74)(75)(76)(77) as well as tocopherols (12,18,65,67,78). Nonetheless, many of those studies rarely addressed interactions between these nutraceuticals components with agronomically relevant traits such as protein, oil, and seed weight. Other major issue is that frequently, seed protein and oil are quantified in terms of their concentrations and not their contents, providing little or null insight into the physiological mechanisms underlying seed quality metabolism (79). Thus, the inclusion of seed weight is key for accounting variations in seed quality traits, taking into account that both protein and oil accumulation are relevant processes during the seed filling period. The present synthesis-analysis showed that when seed weight was above ca. 258 mg seed −1 , the accumulation of TI per unit of seed weight was less than proportional; however, protein concentration continued increasing above this seed weight. From the oil perspective, both TT and oil concentration presented limitations to the increases in their accumulations as seed weight increases. This suggests that both TT and oil concentration are maintained as seed weight increased, suggesting that both components are smoothly diluted, whereas protein concentration can still accumulate (concentrate) as seed weight increased. However, TI showed a clear dilution.
The evident dilution of TI proposes a biological limitation to the accumulation of this nutraceutical component, not related to a physical restriction (i.e., seed size). Instead, environmental conditions during seed development and genetic factors such as gene linkage or pleiotropic effects (i.e., trait-bytrait interactions) could be underlying processes of this limitation at the biosynthesis-level (45,63,64,80). Rotundo et al. (81) documented increases in seed size associated with increased protein and reduced oil concentrations. On the other hand, Nolasco et al. (71) for sunflower and Izquierdo et al. (30) for soybean, reported that TT content increased in a lower proportion than oil weight per grain. In the whole range of the explored seed weight, which was within that reported for the USDA soybean germplasm collection [40-340 mg seed −1 , Nelson and Wang (82)] we observed that as seed weight increased, TT decreased at a higher rate than oil concentration. The lack of association between TT-to-oil content in our synthesis-analysis could be pointing toward compensatory effects of opposite total tocopherol content responses to oil concentration and seed weight, respectively.
Seed chemical composition is the result of complex interactions among seed genetic characteristics and the environment (83). In the current study, we have provided a comprehensive and effective approach for understanding the natural variation of seed nutraceuticals in soybean with an ecophysiological perspective, i.e., analyzing their interrelationships with the major seed components concentrations and contents (Figure 4). Seed weight, closely linked to seed composition and often overlooked in many seed nutraceutical composition studies, arises as an important trait to be further investigated for addressing both seed industrial and nutraceutical quality. Furthermore, future research is needed in order to shed light into the physiological mechanisms occurring during the seed filling period to better understand the effect of environmental conditions (e.g., temperature, water and nutrient availability, radiation), genotype, and/or management practices on modulating changes in soybean seed nutraceuticals. Also, to bridge the gap between soybean matrix constituents, research should also advance toward the relationship between seed protein and carbohydrates. As it is well-documented, increases in protein with reductions in carbohydrates would contribute to enhancing the nutritional value of soybean meal (84,85). The main outcomes presented in this synthesis-analysis provide for the first time, to extend of our knowledge, valuable practical data on the association between nutraceuticals and protein, oil, and seed weight for soybean crop. This review provides foundational knowledge for soybean breeding programs interested on predicting and selecting enhanced meal isoflavone and/or oil tocopherol contents and their relationships with less-cost intensive and more rapidly-assessed agronomically relevant seed traits such as protein and oil content.

AUTHOR CONTRIBUTIONS
CC, FS, and IC contributed to conception and design of the study, organized the database and performed the statistical analysis, and prepared and reviewed the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.