Edited by: Guo-Bo Chen, Zhejiang Provincial People's Hospital, China
Reviewed by: Suhong Bu, Fujian Agriculture and Forestry University, China; Yuan-Ming Zhang, Huazhong Agricultural University, China
*Correspondence: Dechun Wang
Lijuan Qiu
This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Plant Science
†These authors have contributed equally to this work.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Soybean is one of the most important economic crops for both China and the United States (US). The exchange of germplasm between these two countries has long been active. In order to investigate genetic relationships between Chinese and US soybean germplasm, 277 Chinese soybean accessions and 300 US soybean accessions from geographically diverse regions were analyzed using 5,361 SNP markers. The genetic diversity and the polymorphism information content (PIC) of the Chinese accessions was higher than that of the US accessions. Population structure analysis, principal component analysis, and cluster analysis all showed that the genetic basis of Chinese soybeans is distinct from that of the USA. The groupings observed in clustering analysis reflected the geographical origins of the accessions; this conclusion was validated with both genetic distance analysis and relative kinship analysis.
Soybean originated in China, and has a history of planting for more than 4,000 years (Hymowitz and Newell,
There are extensive soybean breeding programs in both China and the USA, most of which rely on a genetic base of Chinese origin (Cui et al.,
The pattern of genetic variation in soybean germplasm resources between China and USA has been evaluated through pedigree information (Cui et al.,
Significant progress has been made using high throughput genotyping technologies to detect variability in DNA sequences, and these technologies are now used regularly in crop germplasm research and breeding (Akond et al.,
This study examined soybean cultivars and advanced breeding lines: 277 from China (hereafter termed as CN-set) and 300 from the USA (hereafter termed as US-set). Of the 277 Chinese accessions collected from 11 provinces, 231 accessions were derived from the Northern spring (Nsp) ecotype and 46 were derived from the Huang-huai-hai summer (Hsu) ecotype. Detailed information on the 277 accessions can be found in Supplementary Table
Geographic distribution of soybean accessions in China
Genomic DNA was extracted from soybean seedlings (leaf) following the protocol presented by Kisha et al. (
Using software of PowerMarker 3.25 (Liu and Muse,
Three multivariate analyses, including model-based population structure analysis, principal component analysis (PCA), and cluster analysis with a neighbor-joining algorithm, were employed to divide the soybean accessions into subgroups. The Bayesian model-based program STRUCTURE 2.3 (Pritchard et al.,
Three statistical methods were used to detect the loci under selection. (1) Differences in allele frequency between the CN-set and the US-set were tested by Student's
A total of 577 soybean accessions were analyzed using the 5,361 SNP markers of the Illumina SoySNP6k iSelect BeadChip. SNP markers with missing data points for more than 20% (166 SNPs) of the accessions were not used for further analysis, so a total of 5,195 SNPs (96.90%) were used in our study. In the CN-set, the average PIC was 0.2643, ranging from 0 to 0.3750, and the average genetic diversity was 0.3307, ranging from 0 to 0.5000. In the US-set, the PIC ranged from 0 to 0.3750, with an average of 0.2408, and genetic diversity ranged from 0 to 0.5000, with an average of 0.2988 (Table
Genetic parameters revealed by the analysis of 5,195 polymorphic SNP markers among the soybean accessions of the diversity panel.
CN | 0.2498 (0~0.5000) | 0.3307 (0~0.5000) | 0.0658 (0~0.8654) | 0.2643 (0~0.3750) |
US | 0.2209 (0~0.5000) | 0.2988 (0~0.5000) | 0.0762 (0~0.9588) | 0.2408 (0~0.3750) |
CN+US | 0.2679 (0.0026~0.5000) | 0.3489 (0.0052~0.5000) | 0.0711 (0~0.7750) | 0.2769 (0.0052~0.3750) |
Distribution of the minor allele frequency (MAF;
Linkage disequilibrium (LD) analysis was performed based on 5,195 SNPs in both the CN-set and the US-set. The
Linkage disequilibrium (LD) decay of China and US soybean accessions. CN is for China; and US is for United States of America.
We investigated the possible population structure without introducing any prior information or assumptions. Population clustering was performed using the STRUCTURE program. Likelihood (ln) increased continuously, with no obvious inflection point (Figure
Genetic structure and relatedness of populations from China and America.
In order to validate and gain further insight into the genetic diversity of the soybean germplasm panel, we constructed a neighbor-joining tree based on the frequency of shared alleles among the accessions. The 577 soybean accessions were classified into two major groups (Figure
Our unbiased population structure, neighbor-joining tree, and PCA analyses all clearly indicated the existence of two distinct subpopulations among the 577 accessions of our study. Clear divergence existed between the Chinese and US soybean accessions. With the exception of a very small number of accessions that were grouped into the subpopulation for the other country (< 7%), most of the accessions from the same origin (country) were clustered into the same genetic group. Therefore, the following studies were based on the two pre-defined genetic pools (i.e., the original two soybean populations from China and US).
Genetic differences between the pre-defined genetic pools were evaluated using four types of analysis of population differentiation: AMOVA, Roger genetic distance, genetic relatedness, and allele frequency. AMOVA indicated that 19.33% of the total genetic variation occurred among the subpopulations, whereas 80.67% was within the subpopulations. The population pairwise
Relative kinship reflects the approximate degree of identity between two given accessions (Figure
The distribution of the differences of allele frequencies (denoted by
Distribution of the differentiation of allele frequencies, f1-f2
Distribution of pairwise
The top 10 candidate loci under selection are presented in Table
The top 10 SNPs with significantly (
ss245627275 |
5 | A | 8234203 | 0.8967 | 0.0103 | 0.8863 | 0.8866 | 148.3306 | 342.8146 | |
ss246094102 | 6 | A | 46849787 | 0.7955 | 0.0765 | 0.7189 | 0.6925 | 107.7709 | 210.3497 | Photoperiod insensitivity (Liu et al., |
ss246165779 |
7 | C | 3376487 | 0.1586 | 0.8869 | 0.7283 | 0.6942 | 108.3583 | 209.3325 | |
ss246411658 | 7 | C | 42514762 | 0.2539 | 0.9883 | 0.7344 | 0.7409 | 116.1999 | 221.5186 | Seed daidzein (Gutierrez-Gonzalez et al., |
ss246490128 | 8 | C | 8917276 | 0.8971 | 0.1160 | 0.7810 | 0.7567 | 120.5933 | 269.0964 | Pod maturity (Reinprecht et al., |
ss247294954 | 10 | C | 44278194 | 0.0879 | 0.8426 | 0.7546 | 0.7256 | 114.3218 | 296.2328 | First flower (Kuroda et al., |
ss247790225 | 12 | A | 34375177 | 0.1350 | 0.8672 | 0.7322 | 0.6971 | 108.8819 | 196.7905 | Branching (Liu et al., |
ss249030246 | 16 | A | 10794262 | 0.9233 | 0.2040 | 0.7193 | 0.6777 | 106.5537 | 142.4584 | Seed oil (Mao et al., |
ss249429323 |
17 | A | 34433642 | 0.1097 | 0.9078 | 0.7982 | 0.7783 | 124.9834 | 301.3143 | |
ss250485410 | 20 | C | 34577475 | 0.0967 | 0.9680 | 0.8713 | 0.8671 | 143.8360 | 294.0404 | First flower (Reinprecht et al., |
Three loci with selection signals (ss246490128 on chromosome 8, ss247294954 on chromosome 10, and ss250485410 on chromosome 20) had pleiotropic effects on all of the following physiological aspects: photoperiod, seed quality, defense, and yield-related traits. Among the top 10 loci ranked based on the strength of their selection signals, ss245627275 on chromosome 5, ss246165779 on chromosome 7, and ss249429323 on chromosome 17 have not been previously reported to be associated with any traits, and one gene near SNP locus ss245627275 was annotated as RAS-related nuclear protein; one gene near SNP locus ss246165779 was annotated as having a function relating to nucleoside diphosphate kinase activity; and one gene near SNP locus ss249429323 was annotated as having a function relating to protein domain specific binding (Table
Population-specific alleles for Chinese soybean genotypes.
ss245630590 | C | 5 | 8603833 | 0.0939 | |
ss246176034 | G | 7 | 4488650 | 0.0421 | First flower (Orf et al., |
ss247314990 | G | 10 | 47212732 | 0.0722 | Pod maturity (Specht et al., |
ss247667029 | G | 12 | 8693064 | 0.0618 | Seed isoflavone (Han et al., |
Fang et al. (
The average genetic diversity and PIC-values for the combined set of all accessions were, respectively, 0.3489 and 0.2769. Compared to the previously reported results based on SNP (Hao et al.,
AMOVA, model-based population structure analysis, NJ-cluster analysis, and PCA were used to examine whether or not the 577 soybean accessions were from highly diverse origins and/or whether or not the accessions from China and the USA are homogeneous or represent two genetically distinct subgroups. We used a variety of methods and consistently obtained similar results for both the number of accessions and the particular membership of accessions within groups. The American accessions were genetically distinct from the Chinese accessions. There were clearly two different subpopulations: an American one and a Chinese one. Cluster analyses grouped accessions with similar geographical origins together, and these findings were in accordance with previous studies (Li et al.,
Regarding kinship, the 577 accessions are distantly related: For combined analysis of the two subpopulations, 85.37% of the pairwise kinship estimates were equal to zero (Figure
The identification of loci with selective signals is a centrally important step for understanding how various populations have adapted to particular agronomic practices and/or unique environments. The
In the present study, we used an
LQ, DW, ZL, and SW: conceived and designed the experiments; ZL, ZW, and XF: performed the experiments; ZL, HL, YL, RG, and YG: analyzed the data; ZL, HL, and ZW: wrote the paper.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: