Identification of the QTL-allele System Underlying Two High-Throughput Physiological Traits in the Chinese Soybean Germplasm Population

The QTL-allele system underlying two spectral reflectance physiological traits, NDVI (normalized difference vegetation index) and CHL (chlorophyll index), related to plant growth and yield was studied in the Chinese soybean germplasm population (CSGP), which consisted of 341 wild accessions (WA), farmer landraces (LR), and released cultivars (RC). Samples were evaluated in the Photosynthetic System II imaging platform at Nanjing Agricultural University. The NDVI and CHL data were obtained from hyperspectral reflectance images in a randomized incomplete block design experiment with two replicates. The NDVI and CHL ranged from 0.05–0.18 and 1.20–4.78, had averages of 0.11 and 3.57, and had heritabilities of 78.3% and 69.2%, respectively; the values of NDVI and CHL were both significantly higher in LR and RC than in WA. Using the RTM-GWAS (restricted two-stage multi-locus genome-wide association study) method, 38 and 32 QTLs with 89 and 82 alleles and 2–4 and 2–6 alleles per locus were identified for NDVI and CHL, respectively, which explained 48.36% and 51.35% of the phenotypic variation for NDVI and CHL, respectively. The QTL-allele matrices were established and separated into WA, LR, and RC submatrices. From WA to LR + RC, 4 alleles and 2 new loci emerged, and 1 allele was excluded for NDVI, whereas 6 alleles emerged, and no alleles were excluded, in LR + RC for CHL. Recombination was the major motivation of evolutionary differences. For NDVI and CHL, 39 and 32 candidate genes were annotated and assigned to GO groups, respectively, indicating a complex gene network. The NDVI and CHL were upstream traits that were relatively conservative in their genetic changes compared with those of downstream agronomic traits. High-throughput phenotyping integrated with RTM-GWAS provides an efficient procedure for studying the population genetics of traits.


INTRODUCTION
With the rapid development of high-throughput genome sequencing technology, high-quality genotype data can be obtained quickly and cheaply, enabling the detection of quantitative trait loci (QTL) at a high level of resolution through genome-wide association studies (GWAS) (Shendure and Ji, 2008;Visscher et al., 2017). Previous QTL studies have primarily focused on the collection of phenotype data (phenotyping) for agronomic traits to achieve various breeding objectives. As upstream biological traits often underlie breeding target traits, there is much interest in identifying upstream traits for the control of downstream agronomic traits. These upstream traits generally consist of some physiological or biochemical traits that are time-consuming and difficult to identify without the appropriate tools. Both high-quality genotype and phenotype data are required for accurate and powerful QTL detection. Because of improvements in the reliability of current genotyping technologies, obtaining high-quality phenotype data in QTL studies has become a major challenge (Cobb et al., 2013). Recently, spectral reflectance has been developed as a highthroughput phenotyping technique (Rebetzke et al., 2019;Watt et al., 2020). Remote-sensing images have been widely used to measure crop traits, such as plant height, biomass, chlorophyll content, disease susceptibility, drought stress sensitivity, nitrogen content, and yield (Gitelson et al., 2003;Estrada et al., 2015;Nigon et al., 2015, Holman et al., 2016Jay et al., 2017, Salas Fernandez et al., 2017Pérez-Bueno et al., 2019). Specifically, the approach is based on quantifying differences in canopy spectral reflectance among varieties for the aforementioned traits . The high-throughput phenotyping platform usually consists of several sensors and automatic systems and provides an efficient method for characterizing plant phenotypes (Furbank and Tester, 2011).
Multispectral and hyperspectral reflectance images have been widely used in high-throughput phenotyping platforms; the spectral index has been found to be closely related to the growth and development of crops (Duan et al., 2017). In studies of hyperspectral remote-sensing technology, vegetation indices are typically used to maximize the relationship between certain reflectance wavelengths and plant function when the effect of background noise is controlled (Huete et al., 2002;Hatfield and Prueger, 2010). Most of the vegetation indices are correlated with plant parameters, such as pigment status, grain yield, NDVI (normalized difference vegetation index), RVI (ration vegetation index), and GNDVI (green and nearinfrared difference vegetation index) (Wiegand et al., 1991;Peñuelas et al., 1997;Lewis et al., 1998;Rutkoski et al., 2016). NDVI is calculated based on the near-infrared spectrum and red-light spectrum (Tucker, 1979), which has been extensively used to evaluate crop growth and estimate nitrogen content, nitrogen uptake, and nitrogen efficiency in crops (Erdle et al., 2011;Samborski et al., 2015;Foster et al., 2017). Studies of crop diseases have also shown that NDVI can be used for crop disease assessment (Kumar et al., 2016). More recently, studies have shown that NDVI is closely related to crop yield (Hassan et al., 2019). Zhang et al. (2019) used hyperspectral remote sensing to establish plot-yield prediction models for field selection in large-scale soybean breeding programs. Specifically, they found that NDVI and RVI were the best combination of vegetation indices for plot-yield prediction in their models.
Chlorophyll is the primary component involved in plant photosynthesis and is closely related to biomass accumulation and yield formation, making it critically important for crop improvement. The rapid and non-destructive estimation of chlorophyll content facilitates genetic studies of chlorophyll. Chlorophyll content can be predicted using different wavelength spectra; for example, there is a strong correlation between the reflectance ratio of the near-infrared band to the 700-nm band and chlorophyll content (Gitelson et al., 2003). The hyperspectral sensor in the high-throughput phenotyping platform is often used to estimate the chlorophyll index (CHL), and this index has been widely used to evaluate chlorophyll content, crop biotic stress, and abiotic stress (Estrada et al., 2015;Awlia et al., 2016;Pérez-Bueno et al., 2019).
NDVI and CHL are both spectral reflectance physiological traits related to plant growth and yield. To evaluate the usefulness of these traits in breeding programs, knowledge of their variability and genetic basis in germplasms is essential. Previously, the measurement of these two physiological traits was tedious and often not possible using traditional instruments. Now, multispectral and hyperspectral images have greatly facilitated the measurement of these traits. The greenhouse high-throughput phenotyping platform (GHTPP) in the Plant Phenomics Research Center (PPRC) at Nanjing Agricultural University (NJAU) has been used by several studies. Phenotype data from the high-throughput phenotyping platforms of previous studies have primarily been used for the prediction of agronomic traits, such as plant yield (Maimaitijiang et al., 2020). However, there is a need for more studies to assess the genetic basis of high-throughput spectral reflectance phenotypes.
The aims of this study were the following: (i) characterize variation in two spectral reflectance physiological traits, NDVI and CHL, in the Chinese soybean germplasm population (CSGP), including wild accessions (WA), cultivated farmer landraces (LR), and released modern cultivars (RC), using the facilities and equipment of the GHTPP at the PPRC, NJAU, to compare wild and cultivated soybeans; (ii) explore genetic variation in QTLalleles through association mapping using the novel RTM-GWAS procedure and evolutionary changes from WA to LR and RC; (iii) predict the genetic potentials of the germplasm population through recombination among the accessions; and (iv) predict the candidate genes as well as the gene constitutions of NDVI and CHL in the CSGP based on information in SoyBase 1 .

Plant Materials and Experimental Design
A total of 341 soybean accessions of the CSGP, including 76 WAs, 83 LRs, and 182 RCs, were sampled in this study.
A randomized incomplete block design experiment with two replicates was conducted for high-throughput phenotyping. Because of the space limitations of the high-throughput phenotyping platform, the accessions were randomly divided into two groups. Two replicates of the phenotyping experiment were performed for the first group of 172 accessions on September 9, 2019 and October 13, 2019, and for the second group of 169 accessions on November 20, 2019 and May 1, 2020. For each test, approximately 4∼5 viable seeds were selected from each accession and were planted in a plastic pot ( 23 × H17 cm). The experimental soil was a 3:1 mixture of vermiculite and nutrient soil; one best soybean seedling remained in each pot on the seventh day after sowing. The temperature in the greenhouse was maintained between 25-33 • C, and light was provided for 16 hours (06:00 to 22:00).

High-Throughput Phenotyping
The greenhouse high-throughput phenotyping platform at the PPRC, NJAU, was used for phenotyping. The platform consisted of a planting area, irrigation area, and PSII (Photosynthetic System II) imaging room. An automatic and high-throughput transfer system was used to transfer plants from the planting/growing area to the imaging room. The PSII imaging room was equipped with a camera system (CropReporter, Phenovation B.V., Netherlands, https://www.phenovation.com/) with a CCD (charge-coupled device) camera, spectral LEDs (light-emitting diodes) for actinic treatment, an illuminated area of 70 cm × 70 cm, and a spectral range of 350-1000 nm. The spectral reflectance images were captured at six different wavelengths. From these images, the NDVI and CHL of the plant canopy for individual pots were estimated. According to CropReporter, the NDVI was calculated as (R NIR − R red )/(R NIR + R red ), and the CHL was calculated as R −1 700 − R −1 NIR , where R NIR , R red , and R 700 are the spectral reflectance in the near-infrared band, the visible red band, and the 700-nm wavelength band, respectively.
The platform was also equipped with the automatic experiment management software IS Agriware Logistics (Version2018.06.99, Indigo Logistics, Netherlands) to control system operation, CropReporter (Version 4.4.2, Phenovation B.V., Netherlands) to control the camera system, and the image analysis software Data Analysis (Version 5.4.8-64b, Phenovation B.V, Netherlands) to process the data. The two traits were unitless, as they represent relative values of reflectance. All of the NDVI and CHL values were obtained directly from the platform system.
The automatic phenotype measurements began on the sixth day after sowing (DAS6). Each pot with a plant was transferred from the planting area into the PSII imaging room to measure NDVI and CHL. Each pot was then returned to the planting/growing area. This phenotyping process was automatically executed every 3 days, and a total of nine measurements (DAS6, DAS9, DAS12, DAS15, DAS18, DAS21, DAS24, DAS27, and DAS30, which means the 6th, 9th,. . ., and 30th day counting from sowing, respectively) were taken throughout the experimental period for each accession type.

SNP Genotyping and SNPLDB Assembly
The 341 soybean accessions were genotyped with RAD-seq (restriction site-associated DNA sequencing) in previous studies (He et al., 2017;Fu et al., 2020;Liu et al., 2020). A total of 145,558, 82,966, and 98,482 SNPs were recovered in these three studies, respectively, and the intersection of SNP data from different studies was taken and filtered with a minor allele frequency > 2% (each allele is present in at least six individuals). A total of 44,931 SNPs were obtained and used in the present study.
The RTM-GWAS (restricted two-stage multi-locus genomewide association study) procedure (He et al., 2017) was used for QTL-allele detection in this study. With RTM-GWAS, a total of 11,716 multi-allelic SNPLDB markers were assembled based on the 44,931 genome-wide SNPs. The number of alleles of the SNPLDB markers ranged from 2 to 11 with an average of 3.1, enabling the detection of QTLs with up to 11 alleles per locus.

Statistical Analysis
The experiment consisted of an incomplete block design. The plot values were adjusted using the block means according to the equal block mean assumption because the material set in each block was randomly selected; therefore, the entire experiment was treated as a completely randomized design with two replicates. The linear model for the adjusted dataset was y i = µ + g i + ε i , where y i is the observed corrected phenotype of the i-th accession, µ is the population mean, g i is the genotypic effect of the i-th accession, and ε i is the random error following a normal distribution with a mean of zero and variance of σ 2 . The analysis of variance of the corrected phenotype data was performed using the PROC GLM in SAS/STAT 9.4 (SAS Institute, Cary, NC), and variance components were estimated using PROC VARCOMP with the REML method. The trait heritability estimate was calculated as h 2 = σ 2 g /(σ 2 g + σ 2 /r), where σ 2 g is the genetic variance, and r is the number of replicates. This h 2 is heritability in narrow sense because σ 2 g contains only additive and additive by additive genetic variance in selfpollinated soybean germplasm population.

Phenotype Data Selection
There were nine measurements (DAS6, DAS9, DAS12, DAS15, DAS18, DAS21, DAS24, DAS27, and DAS30) for each trait (NDVI or CHL) each accession. The trait heritability value was used to assess the goodness of the trait expression, and the measurement with a highest heritability value was chosen to represent the trait. Therefore, the variance components and heritability values were estimated based on analysis of variance for all the nine measurements, and the measurement with the highest trait heritability value was used for genome-wide association study.

Restricted Two-Stage Multi-Locus Genome-Wide Association Study (RTM-GWAS)
The RTM-GWAS method was used for QTL-allele detection (He et al., 2017). Briefly, RTM-GWAS first involved the construction of multi-allelic SNPLDB (SNP linkage disequilibrium block) markers by grouping multiple adjacent and tightly linked SNPs through the LD-block partition. Second, the genetic similarity coefficient matrix based on SNPLDB markers was used to correct for population structure bias by incorporating its eigenvectors as model covariates. Finally, two-stage association analysis was conducted to detect QTLs and their corresponding multiple alleles based on a multi-locus multi-allele model. The linear model of RTM-GWAS in matrix form is y = 1µ + Wa + Xb + e, where y is the phenotype, µ is the population mean, W is the eigenvector matrix representing the population structure, X is the design matrix of locus genotype, a and b are vectors of corresponding effects. At the first stage, X includes only a single SNPLDB marker for pre-selection. At the second stage, X includes multiple SNPLDB markers for multi-locus modeling. In the present study, at the first stage under the single locus model, 1,296 and 2,147 SNPLDBs were pre-selected from the 11,716 SNPLDBs for the second stage of stepwise regression association analysis under the multi-locus model for NDVI and CHL, respectively. As the RTM-GWAS was based on the multilocus model having built-in control for the experiment-wise error rate, a normal significance level of 0.05 was used for QTL detection in this study.

Prediction of the Genetic Potential of NDVI and CHL in the CSGP
To predict the recombination potential of the population, all possible single crosses among entire accessions, among subpopulation accessions, and between subpopulation accessions were simulated in silico (He et al., 2017). For each cross, 2,000 inbred lines were derived, and the phenotypes were predicted for each line according to the QTL-allele matrix. Finally, the recombination potential of each cross was assessed using the 99th percentiles of the predicted phenotype data.

Annotation of Candidate Genes and GO Analysis of NDVI and CHL
According to SoyBase(see footnote 1), the candidate genes for NDVI and CHL were annotated from the identified QTLs. Next, the annotated candidate genes were subjected to gene ontology (GO) analysis using the Williams 82 genome version 1 (Wm82.a1.v1.1) as the reference genome. The candidate genes were searched within the interval (with a 50-kb flanking expansion) of the associated loci. In order to have a preliminary validation of the annotated candidate genes, the RNA Seq-Atlas project data set in SoyBase (see footnote 1) was downloaded and analyzed to assess the expression level of the annotated genes for NDVI and CHL.

Phenotypic Variation of NDVI and CHL in the CSGP
Phenotype measurements for each accession were taken nine times on different days during growth (DAS6, DAS9, DAS12, and DAS30); the trait heritability at each measurement ranged between 16.0-78.3% and 25.4-69.2% for NDVI and CHL, respectively. The NDVI at DAS21 and CHL at DAS24 had the highest heritabilities and were thus examined in subsequent analyses.
The frequency distribution showed that the NDVI ranged from 0.05 to 0.18 with an average of 0.11 ( Table 1). The entire population was separated into WA, LR, and RC subpopulations; the mean NDVI of the WA was relatively small (0.07) with values ranging from 0.05-0.13. The mean NDVI of LR and RC was 0.11 and 0.12, respectively, with values ranging from 0.06-0.18 and 0.06-0.17, respectively. The CHL frequency distribution of the entire population ranged from 1.20 to 4.78, with an average of 3.57. The CHL mean of WA was 3.08 and ranged from 1.20-4.49; the CHL mean of LR and RC was 3.68 and 3.72, respectively, and ranged from 2.73-4.78 and 2.67-4.50, respectively ( Table 1).
Both NDVI and CHL, two physiological traits related to photosynthesis and growth, significantly differed between wild (0.07, 3.08) and cultivated (0.11-0.12, 3.68-3.72) soybeans, suggesting that cultivated soybeans have experienced significant improvements in photosynthesis-and growth-related traits following their domestication (Table 1). Thus, these basic shortcomings of wild soybean should not be neglected when wild soybeans are used to improve cultivated soybeans.
The analysis of variance revealed significant differences among accessions for both NDVI and CHL, indicating that there was significant genetic variation for the two spectral reflectance traits (Supplementary Table 1). The trait heritability was estimated to be 78.3% for NDVI and 69.2% for CHL (Table 1). These findings indicate that phenotypic variation (PV) was primarily driven by genetic factors, and the underlying QTLs or genes could be traced through further genetic analysis.

Identification of the QTL-allele System Determining NDVI and CHL in the CSGP
According to the RTM-GWAS procedure involving the use of 1,296 and 2,147 multi-allelic SNPLDB markers preselected at the first stage for the second stage multi-locus multi-allele association analysis, a total of 38 and 32 SNPLDBs, each with 2-4 and 2-6 alleles were determined to be significantly associated with NDVI and CHL, respectively ( Table 2, Supplementary Tables 2, 4).
The 38 NDVI-associated loci explained 48.39% of the phenotypic variation (PV), among which 17 large-contribution loci (R 2 ≥ 1%) explained 34.77% of the PV and 21 smallcontribution loci (R 2 < 1%) explained 13.62% of the PV ( Table 2). The phenotypic contribution of each associated locus to PV ranged between 0.37-3.84%. These loci were distributed on 15 chromosomes with 1 to 5 loci on each chromosome; chromosome 1 had the most loci (Figures 1A-C). In NDVI, the total genetic variation (heritability) was 78.3%, and the genetic contribution of the detected 38 QTLs was 48.39%; consequently, 29.91% of the genetic variation was not detected, which can be attributed to a collection of unmapped QTLs that needs to be further explored under controlled conditions where experimental error is minimized.
The CHL-associated loci had a similar pattern to those of NDVI ( Table 2). The PV explained by the CHL-associated loci was 51.35%, among which 19 large-contribution loci explained 42.95% of the PV and 13 small-contribution loci explained 8.40% of the PV; the number of large-contribution loci was greater than the number of small-contribution loci. The PV of each associated locus ranged from 0.38 to 5.38%. These loci were distributed on 18 chromosomes with 1-6 loci on each chromosome; chromosome 5 had the most loci (Figures 1D-F).
In CHL, the total genetic variation (trait heritability) was 69.2%, and the genetic contribution of the detected 32 QTLs was 51.35%; therefore, 27.85% of the genetic variation was not detected and will require further study to elucidate. For other agronomic traits, such as 100-seed weight, days to flowering, and drought tolerance, the number of alleles per locus detected have been reported to range from 2-10 (He et al., 2017), 2-10 (Fu et al., 2020), and 2-12 , respectively. By comparison, the alleles per locus of NDVI and CHL in the present study were only 2-4 and 2-6, respectively, and the permarker number of alleles was 2-11. The differences observed in these two spectral reflectance physiological traits potentially indicate that genetic differentiation at single loci is less likely for these two traits compared with other agronomic traits.

QTL-allele Matrices of NDVI and CHL and Their Evolution From WA to LR and RC
The RTM-GWAS method provides a powerfull approach for the detection of genome-wide QTLs and their multiple allele effects. In this study, the effects of the 2-4 alleles per locus for a total of 89 alleles on 38 loci were obtained for NDVI. These QTLalleles of the 341 accessions can be organized into a 38 × 341 (locus × accession) matrix (Figure 1G), which represents a compact form of the genetic structure of the population and was designated as the QTL-allele matrix of NDVI. Similarly, the effects of the 2-6 alleles per locus for a total of 82 alleles on 32 loci and the 32 × 341 (locus × accession) matrix were obtained for CHL ( Figure 1I). The QTL-allele matrix detected from the RTM-GWAS contained all of the genetic constitutions of a trait in a population and can thus be used for the study of population genetic differentiation. The cultivated soybean is generally thought to have been domesticated from annual wild soybean, with released cultivars developed from farmer landraces . The QTL-allele matrix can be separated into its component matrices to facilitate the tracing of evolutionary genetic changes from WA to LR and RC. For NDVI, there were 85 alleles on 38 loci in WA; 82 wild alleles on 38 loci were passed to LR, with the emergence of 2 new alleles on 2 loci and the exclusion of 3 alleles on 3 loci for a total of 84 alleles on 38 loci ( Table 3). From LR, 82 alleles on 38 loci were passed to RC, with the emergence of 2 new alleles on 2 loci, the recovery of 2 wild alleles on 2 loci, and the exclusion of 2 alleles on 2 loci for a total of 86 alleles on 38 loci. In LR + RC, 84 alleles on 38 loci were inherited from WA, including the emergence of 4 new alleles on 4 loci and the exclusion of 1 allele on 1 locus for a total of 88 alleles on 38 loci. Among the 4 newly emerged alleles on 4 loci in the cultivated LR + RC, 2 new loci with 2 new alleles emerged in LR + RC. In LR vs. WA, 1 of the 2 newly emerged alleles was in the newly formed QTL qNdvi-01-5 in LR, which was not polymorphic in WA; in RC vs. LR, 1 of the 2 emerged alleles was in the newly formed QTL qNdvi-03-1 in RC and was not polymorphic in WA and LR (Figures 2, 3; Table 3).
Similar results were obtained for CHL. There were 76 alleles on 32 loci in WA; 76 wild alleles on 32 loci were passed to LR, with the emergence of 4 new alleles on 4 loci for a total of 80 alleles on 32 loci; no alleles were excluded ( Table 3). From LR, 80 alleles on 32 loci were passed to RC, with the emergence of 2 new alleles on 2 loci and the exclusion of 1 allele on 1 locus for a total of 81 alleles on 32 loci; no wild alleles were recovered. In LR + RC, 76 alleles on 32 loci were inherited from WA, with the emergence of 6 new alleles on 5 loci for a total of 82 alleles on 32 loci; no wild alleles were excluded.
All of the emerged and excluded alleles and their associated QTLs are listed in Table 4. For NDVI, there were 4 newly emerged alleles on 4 loci (qNdvi-01-5, qNdvi-02-1, qNdvi-03-1, and qNdvi-15-2); qNdvi-01-5 and qNdvi-03-1 were also newly formed in LR and RC, respectively. One allele in qNdvi-05-4 was excluded in LR, and 1 allele in qNdvi-05-3 was excluded in RC. For CHL, there were 6 newly emerged alleles on 5 loci (qChl-06-2, qChl-08-3, qChl-18-2, qChl-19-1, and qChl-20-1), and 1 allele on qChl-01-1 was excluded in RC. The allele frequencies of the newly emerged alleles in LR + RC ranged between 3.40-13.58%, and the new alleles were not dominant over older alleles. Thus, genetic changes were limited during the evolution from WA to LR and RC, as the three subpopulations shared a large number of common alleles. Among the 89 wild alleles of 38 NDVI-associated loci, 80 alleles were shared among the three subpopulations and among the 76 wild alleles of 32 CHLassociated loci, and 75 wild alleles were shared among the three subpopulations (Figure 3). Here, the total change (emerged plus excluded) in alleles (5 alleles (5.7%) on 5 loci (13.2%) for NDVI and 6 alleles (7.3%) on 5 loci (15.6%) for CHL) was much lower relative to the changes in alleles observed for other agronomic traits. For example, there were a total of 261 alleles on 75 loci in Chinese cultivated soybeans for drought tolerance, and 46 alleles (17.6%) on 27 loci (36.0%) were changed in RC relative to LR . In addition, there were a total of 342 alleles on 81 loci in Northeast China soybeans for earliness, and 143 alleles (41.8%) on 67 loci (82.7%) were changed in the early group relative to the late group (Fu et al., 2020).   In summary, these two spectral reflectance physiological traits (NDVI and CHL) were genetically conservative. Inheritance played a major role in determining the genetic motivation, For NDVI, 4 alleles on 4 loci emerged; for CHL, 6 alleles on 5 loci emerged. Two new loci emerged for NDVI, but none emerged for CHL. For NDVI, only 1 allele on 1 locus was excluded, whereas no alleles were excluded for CHL. The transition from WA to LR and from LR to RC took approximately 5,000 and 100 years, respectively; despite this long history, large genetic changes have not occurred, especially during the transition from WA to LR. This genetic stability indicates that the two physiological traits NDVI and CHL are highly conservative compared with other agronomic traits. However, the same number of genetic changes occurred during the transition from WA to LR (5,000 years) and the transition from LR to RC (100 years), indicating that the enhanced artificial breeding in the transition from LR to RC accelerated the rate of genetic change.

Prediction of the Recombination Potential of NDVI and CHL in the CSGP
To assess the recombination potential for NDVI and CHL in the CSGP, a total of 57,970 possible single crosses among the 341 accessions were simulated based on the QTL-allele matrix; possible crosses between the accessions for each subpopulation were also simulated. The 99th percentile of 2,000 progenies of each cross was used to represent the recombination potential ( Table 5). For NDVI, the recombination potential within the WA, LR, and RC was not large (0.15 vs. 0.13 of the extreme phenotype, 0.20 vs. 0.18 of the extreme phenotype, and 0.20 vs. 0.17 of the extreme phenotype, respectively), but the predicted value was larger in LR and RC (the superior subpopulations) than in WA. Among the three between-subpopulation crosses, the highest recombination potential was observed for LR × RC (0.21 vs. 0.19 in WA × LR and WA × RC); for the crosses at the entire population level, the NDVI was 0.21, which was the same as that observed for LR × RC (Table 5, Figure 1H).
For CHL, the recombination potentials within WA, LR, and RC were also not large (4.93 vs. 4.49 of the extreme phenotype, 5.31 vs. 4.78 of the extreme phenotype, and 5.06 vs. 4.50 of the extreme phenotype, respectively), but the predicted value was larger in LR and RC (the superior populations) than in WA. Among the three between-subpopulation crosses, the highest recombination potential was observed for WA × LR (5.41 vs. 5.38 and 5.21 in WA × RC and LR × RC, respectively); for the crosses at the entire population level, the CHL was 5.41, which was the same as that observed for WA × LR (Table 5, Figure 1J). Thus, The cells with lowercase letters l and r are alleles excluded in LR and RC (vs. WA), respectively. The uppercase letters L and R in cells are alleles that emerged in LR and RC (vs. WA), respectively. In addition, the QTL qNdvi-01-5 emerged in LR and was not polymorphic in WA. The QTL qNdvi-03-1 emerged in RC and was not polymorphic in WA and LR. crosses with the WA had a greater recombination potential for CHL, which was opposite to the pattern observed for NDVI.

Annotation of Candidate Genes and GO Analysis of NDVI and CHL in the CSGP
From the detected QTLs, a total of 39 candidate genes were annotated on 20 NDVI-associated loci, and 32 candidate genes on 22 CHL-associated loci (Supplementary Table 3). Only 9 candidate genes were annotated on 7 large-contribution loci of NDVI, but most (21 out of 32) of the CHL-related candidate genes were annotated on 13 large-contribution loci of CHL. Gene ontology (GO) analysis revealed that these candidate genes for both NDVI and CHL can be classified into three categories: biological process, molecular function, and cellular component (Table 6, Supplementary Figure 1). In biological process, NDVI involved 14 of 15 function groups, and CHL involved 11 of 15 function groups with five group differences. In molecular function, NDVI and CHL both involved 4 of 5 groups with two group differences. In cellular component, both NDVI and CHL involved all of the 5 function groups (Table 5, Supplementary  Figure 1). The candidate gene systems of NDVI and CHL both involved a similar set of genes, although their frequency distributions differed. The two genetic systems consisted of a series of genes involved in a complex gene network.
The validation of these candidate genes are left for further studies, however, a preliminary verification was carried out using the transcriptome data set of RNA Seq-Atlas project in SoyBase (see footnote 1). The gene expression results ( Supplementary  Figure 2) showed that 31 out of the 39 annotated genes for NDVI were expressed in 14 soybean tissues, among which Glyma05g23230 and Glyma14g06630 showed especially high expression level. For CHL, there were 28 out of 32 annotated genes were expressed in 14 soybean tissues, and Glyma08g10960 showed high expression level in young leaf and pod shell, indicating these identified candidate genes are possibly functional.

Efficiency of High-Throughput Phenotyping Integrated With RTM-GWAS in Identifying NDVI and CHL QTL-allele Systems
This study presented a genetic analysis of two spectral reflectance traits, NDVI and CHL, using a high-throughput phenotyping platform. Both NDVI and CHL showed significant genetic variation in CSGP, indicating that the spectral reflectance phenotyping data can not only be used for predicting agronomic traits but also for dissecting their underlying genetic basis. In this study, 89 alleles on 38 loci for NDVI and 82 alleles on 32 loci for CHL were detected, and the RTM-GWAS method was used to characterize their allele effects. High-throughput phenotyping integrated with RTM-GWAS was an efficient method for identifying the QTL-allele systems for NDVI and CHL. However, only 48.36% and 51.35% of the PV for NDVI and CHL were explained by the detected loci, which is low compared with other agronomic traits, such as 100-seed weight (139 QTLs explained 98.2% of the PV with a heritability of 98.9%, He et al., 2017). Although the large-contribution QTLs have been identified, many small-contribution QTLs consisting of unmapped minor QTLs have yet to be identified according to the RTM-GWAS. This observation might stem from experimental error given that the heritability values were only 78.3% and 69.2% for NDVI and CHL, respectively. Such error might have decreased the sensitivity of our analysis to detect QTLs, leaving 29.91% and 27.85% of the genetic variation (presumably unmapped minor QTLs) undetected. In the present study, an incomplete block design was conducted to separate all accessions into two sets for two respective tests to circumvent the space limitations associated with the high-throughput phenotyping platform. Although we employed a method to make the environment uniform between the different tests, much room for improvement remains.
In the present study, the PV explained by individual QTL ranged between 0.37-3.84% for NDVI, and 0.38-5.38% for CHL. There were 17 and 19 large-contribution (R2 ≥ 1%) and 21 and 13 small-contribution (R2 < 1%) QTLs for NDVI and CHL, respectively. The PV in RTM-GWAS is relatively lower than that in single-locus model such as the mixed linear model method (Yu et al., 2006). In single-locus model, association test is performed for each locus individually, and the estimated contribution of a locus may be inflated obviously due to the correlations among neighboring loci (He et al., 2017). But in RTM-GWAS, multiple QTLs are jointly fitted in a multi-locus model and the estimated PV for each QTL is unbiased and the total PV is controlled within  heritability value, therefore, the individual QTL in RTM-GWAS may look smaller than those from single locus model procedure.
The fact of many QTLs each with a smaller PV is a characteristic of a quantitative trait controlled by a large number of QTLs. Or as we understand, the total PV of a quantitative trait in fact is a projection of a large number of genes/QTLs with different but interrelated biological functions onto the trait plane in a specific population. The statistically estimated genetic effect or PV of a QTL is relative to and largely depends on the genetic background of the population. A same QTL may exhibit varying effects in different populations with different genetic background. For example, the PV of a QTL in a simple genetic background such as near-isogenic lines is much greater than that in a germplasm population. Therefore, the small-contribution QTL in a study may exhibit large effects in other populations with simple genetic background. In fact, the purpose of the RTM-GWAS method is to achieve a relatively thorough detection of whole-genome QTLs and their multiple alleles or the QTL-allele system rather than a few individual large PV QTLs. Thus for the genetic improvement of quantitative traits in plant breeding, background control and foreground control are both important. It is likely that breakthroughs can be achieved through increase of positive alleles and decrease of negative alleles among multiple loci, rather than through recombination between/among a few loci like in the qualitative trait situation.
Genetic Conservativeness of NDVI and CHL, Their Improvement Potential and Implications to Breeding for Seed-Yield of Soybeans Both phenotypic and genotypic analysis showed that the two spectral reflectance traits were genetically conservative in comparison to the agronomic traits, such as seed yield, 100 seed weight, days to flowering (He et al., 2017;Zhang et al., 2019;Fu et al., 2020). Because in the present results, (i) no significant phenotypic improvements were observed in RC, and trait values were low in LR and RC; (ii) there was a limited number of alleles per locus; (iii) there was a large number of shared wild alleles among WA, LR, and RC, few new alleles, and little exclusion of wild alleles; and (iv) the recombination potential was low. In other words, the two spectral reflectance traits, NDVI and CHL, were more conservative than other agronomic traits in their genetic changes. We suspect that these spectral reflectance traits are upstream traits, whereas the other agronomic traits are downstream traits, the upstream traits may be more conservative than the downstream traits due to more factors may influence the downstream traits. For example, NDVI and CHL are traits related to the biological process of photosynthesis or organic synthesis while the agronomic traits such as seed yield may relate to the biological processes of transportation and storage of organics in addition to organic synthesis. The fact that breeding generally acts on downstream traits more readily compared to upstream traits may explain why the latter was more conservative and with less phenotypic improvements. Thus, additional effort is needed to improve upstream traits, such as NDVI and CHL, which are involved in light interception, light function and therefore, in photosynthesis and organics production.
However, some potential for improvement in NDVI and CHL was observed from WA to LR + RC, although the improvement was small ( Table 1). The genetic mechanism underlying the observed evolutionary improvements might be recombination among loci-alleles given that all of the wild alleles passed to LR + RC except one negative wild allele excluded; furthermore, new alleles did not make up a large proportion of the alleles, given that few new alleles emerged (Tables 3, 4). This point is supported by the optimal cross prediction that the recombination among the loci/accessions might result in transgressive progenies, i.e., approximately 13-16% of genetic progress for NDVI and CHL might be achieved through hybridization in the CSGP ( Table 5).
The previous studies on high-throughput phenotypes in crops usually focused on predicting yield-related agronomic traits, such as plant height, biomass and seed yield (Holman et al., 2016;Salas Fernandez et al., 2017, Maimaitijiang et al., 2020. For example, our previous results showed that NDVI was selected as the best vegetation indices in the establishment of plot-yield prediction models in breeding programs of soybeans (Zhang et al., 2019). Here in the present study, genetic dissection of the two high-throughput physiological traits, NDVI and CHL, was performed based on the high-throughput phenotyping technique. As NDVI and CHL are upstream traits and agronomic traits are breeding-acted target traits, identifying the genetic system of upstream traits may help to understand the genetic mechanism of downstream targets and also may provide additional control of downstream targets. For example, it was reported that NDVI was also a proxy for drought-adaptive traits in durum wheat, and high-throughput data collection of NDVI with capable precision can facilitate the genetic dissection of drought-adaptive traits (Condorelli et al., 2018). Thus, in breeding programs, breeders can combine the selection for upstream traits using high-throughput phenotyping data and the selection for downstream traits using agronomic data to have both selected and improved, which might benefit the enhanced selection of the downstream traits. This explains the reason that we suggested in yield breeding programs to combine the selection before harvest using NDVI prediction models established from hyperspectral reflectance data and the selection of harvested yield to achieve an enhanced selection for genotypic yield (Zhang et al., 2019).

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://github.com/ njau-sri/leiwang2020ndvichl.