Genome-Wide Association Studies of Mineral Content in Common Bean

Gunjača, Jerko; Carović-Stanko, Klaudija; Lazarević, Boris; Vidak, Monika; Petek, Marko; Liber, Zlatko; Šatović, Zlatko

doi:10.3389/fpls.2021.636484

ORIGINAL RESEARCH article

Front. Plant Sci., 05 March 2021

Sec. Plant Breeding

Volume 12 - 2021 | https://doi.org/10.3389/fpls.2021.636484

This article is part of the Research TopicBiological and Genetic Basis of Agronomical and Seed Quality Traits in LegumesView all 30 articles

Genome-Wide Association Studies of Mineral Content in Common Bean

Jerko Gunjača^1,2

Klaudija Carović-Stanko^2,3*

Boris Lazarević^2,4

Monika Vidak²

Marko Petek⁴

Zlatko Liber^2,5

Zlatko Šatović^2,3

¹Department of Plant Breeding, Genetics and Biometrics, Faculty of Agriculture, University of Zagreb, Zagreb, Croatia
²Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE CroP-BioDiv), Zagreb, Croatia
³Department of Seed Science and Technology, Faculty of Agriculture, University of Zagreb, Zagreb, Croatia
⁴Department of Plant Nutrition, Faculty of Agriculture, University of Zagreb, Zagreb, Croatia
⁵Department of Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia

Micronutrient malnutrition is one of the main public health problems in many parts of the world. This problem raises the attention of all valuable sources of micronutrients for the human diet, such as common bean (Phaseolus vulgaris L.). In this research, a panel of 174 accessions representing Croatian common bean landraces was phenotyped for seed content of eight nutrients (N, P, K, Ca, Mg, Fe, Zn, and Mn), and genotyped using 6,311 high-quality DArTseq-derived SNP markers. A genome-wide association study (GWAS) was then performed to identify new genetic sources for improving seed mineral content. Twenty-two quantitative trait nucleotides (QTN) associated with seed nitrogen content were discovered on chromosomes Pv01, Pv02, Pv03, Pv05, Pv07, Pv08, and Pv10. Five QTNs were associated with seed phosphorus content, four on chromosome Pv07, and one on Pv08. A single significant QTN was found for seed calcium content on chromosome Pv09 and for seed magnesium content on Pv08. Finally, two QTNs associated with seed zinc content were identified on Pv06 while no QTNs were found to be associated with seed potassium, iron, or manganese content. Our results demonstrate the utility of GWAS for understanding the genetic architecture of seed nutritional traits in common bean and have utility for future enrichment of seed with macro– and micronutrients through genomics-assisted breeding.

Introduction

Micronutrient malnutrition, known as “hidden hunger,” particularly the lack of minerals such as Fe and Zn, is the main global nutritional problem (Hirschi, 2009; Diepenbrock and Gore, 2015; Semba, 2016; Yeken et al., 2018), due to great importance of micronutrients in fundamental biological functions (Tapiero et al., 2003). Common bean (Phaseolus vulgaris L.) is a species of great interest for human diet worldwide, gaining attention as functional food offering benefits for human health (Câmara et al., 2013). It provides macro- and micronutrients (especially Fe and Zn) and because of high protein content, together with other pulses common bean is known as poor man’s meat (Gouveia et al., 2014; Mahajan et al., 2017; Nwadike et al., 2018). The nutritional composition of common bean landraces depends on factors like origin, genotype and environmental conditions (Gouveia et al., 2014). Moreover, researches that have analyzed the genetic control of seed composition were mainly focused on minerals such as iron, phosphorus and zinc (Blair et al., 2009; Cichy et al., 2009), since they are among the most important nutritional deficiencies in humans.

The availability of molecular markers has enabled the determination of the origin and diversity of populations, as well as the elucidation of the genetic basis of important complex agronomic traits with increased resolution (Gioia et al., 2013; Chávez-Servia et al., 2016; Valdisser et al., 2017). For this purpose, microsatellite markers have been the most widely used markers over the past decade (Razvi et al., 2017; Valdisser et al., 2017). In recent years, single nucleotide polymorphism (SNP) markers have been developed and increasingly used for genetic and evolutionary studies, analysis of genome structure, genetic diversity analysis, genome-wide association mapping and integration of genetic maps representing a useful tool for plant breeding purposes (Goretti et al., 2014; Villordo-Pineda et al., 2015; Nemli et al., 2017; Valdisser et al., 2017). Diversity Arrays Technology (DArTseq), based on genome complexity reduction and SNP detection through hybridization of PCR fragments (Jaccoud et al., 2001), has been successfully used for the construction of dense linkage maps and quantitative trait locus QTL analysis, genome-wide association studies (GWAS) and genetic diversity studies (Valdisser et al., 2017).

Over the last decade, genome-wide association study (GWAS) has become a popular approach for studying traits of agricultural importance and has gained popularity particularly for screening a great number of accessions to gain insight into understanding the genetic basis of complex traits (Yu et al., 2006). In common bean, GWAS has been used to identify genes controlling traits such as disease resistance (Shi et al., 2011; Perseguini et al., 2016; Zuiderveen et al., 2016; Tock et al., 2017, Fritsche-Neto et al., 2019), drought-tolerance related traits (Galeano et al., 2012; Hoyos-Villegas et al., 2017), agronomic traits in general (Nemli et al., 2014; Kamfwa et al., 2015b; Moghaddam et al., 2016; Ates et al., 2018; Nascimento et al., 2018; Resende et al., 2018), nitrogen fixation (Kamfwa et al., 2015a), cooking time (Cichy et al., 2015), flooding tolerance (Soltani et al., 2017; Soltani et al., 2018), content of micronutrients (Mahajan et al., 2017; Katuuramu et al., 2018; Myers et al., 2019; Diaz et al., 2020; Erdogmus et al., 2020), and pod shattering (Rau et al., 2019).

The basic goal of GWAS is to detect markers that are either associated with a trait of interest directly or are in linkage disequilibrium (LD: non-random association of alleles at different loci in a given population) with a quantitative trait locus (QTL) that controls it. Cited GWAS studies on content of micronutrients detected numerous QTLs associated with Fe, Ca, Zn and Mn content, located an all chromosomes. Most of them represent minor genes, usually explaining around 10% or less of total phenotypic variation, with only a few exceptions. Earlier studies employing classical QTL analysis summarily detected a large number of QTLs; some of them were associated with major genes, but they were usually population or environment specific (Blair et al., 2009; Cichy et al., 2009; Blair et al., 2010, 2011). Pooling together the populations from different studies, meta-analysis resulted in the reduction of the original set of 87 detected QTLs into a set of 12 meta-QTLs, two specific for iron and zinc, and eight common for both minerals (Izquierdo et al., 2018). All discovered QTLs provide promising potential for use in plant breeding programs targeted at mineral biofortification, such as HarvestPlus (Pfeiffer and McClafferty, 2007). Newly developed biofortified cultivars exhibit potential for improving the iron status in iron-deficient individuals (Tako et al., 2015; Haas et al., 2016).

Reviewing the GWAS studies in common bean, it is possible to notice that many differences exist in applied strategy, regards the methodology in general, as well as in choices made at the different steps of the process. The markers which are in LD with QTLs controlling the analyzed trait cannot be straightforwardly identified because besides the physical linkage, LD can also be created by the genetic relatedness between individuals and/or population structure. These factors can extend LD over larger chromosomal regions, thus increasing the number of spurious associations that are most likely just false positives. This can be illustrated by the difference between uncorrected and kinship/structure corrected measures of LD (r²). While uncorrected r² indicates strong LD even for the SNPs located on the opposite ends of the chromosome (Valdisser et al., 2017; Resende et al., 2018; Diniz et al., 2019), bias-corrected measures indicate LD decay of r² to 0.1 at distances of approximately 250 kbp (Diaz et al., 2020), 400 kbp (Valdisser et al., 2017), 700 kbp (Resende et al., 2018), or up to 1 Mbp (Diniz et al., 2019). Therefore, the prevailing method of detecting marker-trait associations is based on a mixed linear model (MLM) proposed by Yu et al. (2006). It is also known as K + Q model because includes the fixed effect of population structure (Q) and random effect of kinship (K), and it is implemented in software packages such as TASSEL (Bradbury et al., 2007) and GAPIT (Lipka et al., 2012), providing different options for kinship/structure correction. In the next step, raw p-values were subjected to multiple testing adjustment in order to remove false positives. Some authors prefer the most stringent method of Bonferroni (Galeano et al., 2012; Kamfwa et al., 2015b; Zuiderveen et al., 2016), the others more relaxed variants of false discovery rate (FDR) adjustment (Cichy et al., 2009; Ates et al., 2018; Katuuramu et al., 2018) or using empirical distribution created by bootstrap to determine the cutoff point (Moghaddam et al., 2016; Soltani et al., 2018). Two studies (Moghaddam et al., 2016; Rau et al., 2019) combined described single-locus approach with the multilocus mixed model (MLMM) of Segura et al. (2012). MLMM is based on the same Q + K model but fitted in the stepwise procedure, adding or excluding a marker as a fixed effect (cofactor) in the model at each step.

In the present study, a panel of 174 accessions representing Croatian common bean landraces was used for GWAS based on DArTseq-derived SNP markers with the aim of identifying quantitative trait nucleotides (QTNs) associated with variation in seed content of eight nutrients (N, P, K, Ca, Mg, Fe, Zn, and Mn). The secondary aim of the study was to compare different methodology options and identify their possible pitfalls and shortcomings.

Materials and Methods

Plant Material and Mineral Content Assessment

The study was performed using 174 accessions representing the most commonly used Croatian landraces of common bean (Carović-Stanko et al., 2017) held at the University of Zagreb Faculty of Agriculture, Department of Seed Science and Technology. The list of accessions with their passport data, phaseolin type and cluster membership is given in Supplementary Table 1. Phenotypic data on mineral content were drawn from an earlier study (Palèić et al., 2018) analyzing nitrogen (N), phosphorus (K), potassium (K), calcium (Ca), magnesium (Mg), iron (Fe), zinc (Zn), and manganese (Mn) variability in a broader panel of 226 accessions of the collection.

Genotyping and Data Preparation

Genotyping was carried out using microsatellite and DArTseq-derived SNP markers. Twenty-six microsatellite markers yielding a total of 135 alleles in the panel consisting of 174 common bean accessions were used to infer the population structure using STRUCTURE 2.3.3 software (Pritchard et al., 2000) as described in Carović-Stanko et al. (2017). Phaseolin type of each accession was determined by amplification of phaseolin sequences (Kami et al., 1995) as described in Carović-Stanko et al. (2017). DArTseq analysis was performed by Diversity Arrays Technology Pty Ltd., Bruce, Australia¹. The quality of DArTseq-derived SNP markers was determined by the parameters ‘reproducibility’ (percentage of technical replicate pairs scoring identically for a given marker), ‘call-rate’ (percentage of samples for which a given marker was scored) and ‘MAF’ (minor-allele frequency) (Wenzl et al., 2004). Marker sequences were aligned against the reference genome of Phaseolus vulgaris (Schmutz et al., 2014) using BLASTN (Zhang et al., 2000). Final SNP data quality control was performed by excluding all SNPs with MAF < 0.05 and all SNPs with >0.05 heterozygotes, resulting in the final set of 6,311 high-quality DArTseq-derived SNPs. The missing SNP data were imputed by using Beagle 5.1 genotype imputation method (Browning et al., 2018). The imputed data set was then used to construct a kinship matrix by applying four methods implemented in TASSEL 5 software (Bradbury et al., 2007): (1) centered IBS (Endelman and Jannink, 2012), (2) normalized IBS (Yang et al., 2011), (3) dominance centered IBS (Muñoz et al., 2014), and (4) dominance normalized IBS (Zhu et al., 2015). Additionally, we used the corrected relatedness matrix as proposed by Diniz et al. (2019).

Linkage Disequilibrium

Non-random association between alleles at different loci was measured by r². Besides straightforward r², corrected measures designed to remove the bias caused by population structure (r_S²), kinship (r_V²) and both (r_VS²) were also estimated (Mangin et al., 2012). Both measures involving kinship correction were estimated using five different kinship matrices described above. In order to visualize LD decay as a function of distance, all measures were fitted to Hill and Weir model (Hill and Weir, 1988).

Genome-Wide Association Study (GWAS)

Before carrying out the GWAS, missing phenotypic data were imputed using PHENIX method as implemented in the eponymous R package (Dahl et al., 2016). Prior to imputation, outliers were removed using the “trim” option in “phenix” (trim.sds = 1.96). GWAS was performed using both single (Yu et al., 2006) and multi-locus model approaches (Segura et al., 2012). Mixed linear models fitted in both cases included corrections for population structure and genetic relatedness (Q and K matrices). Population membership estimates were derived from microsatellite data using STRUCTURE (Pritchard et al., 2000) and the centered IBS method was used for adjustment of genetic relatedness. Single locus models were fitted in TASSEL 5 while the multi-locus models were employed in R package MLMM (Segura et al., 2012). After fitting single-locus models in TASSEL, the resulting raw p-values were subjected to multiple testing adjustment. In order to score for more potential marker-trait associations, Storey’s FDR approach was used instead of frequently applied Bonferroni correction (which tends to be too conservative). Row p-values from TASSEL were converted into Storey’s q-values using R package “qvalue” (Storey et al., 2020), and q-value of 0.2 was selected as the significance threshold. Distributions of p-values from TASSEL fits across the genome were visualized by Manhattan plots created using R package “CMplot” (Yin et al., 2020). Before creating Manhattan plots for all traits with significant SNPs, for each trait, an approximate threshold was calculated as p-value of a hypothetical SNP that would have a q-value of 0.2. A similar approximate significance threshold was created for MLMM – p-values from step zero were used to estimate p-value of hypothetic SNP that would have a q-value of approximately 0.2. The distribution of alleles across subpopulations for QTNs was visually inspected by creating violin plots.

Results

Genotyping and Data Preparation

In concordance with the results described by Carović-Stanko et al. (2017), the STRUCTURE analysis based on 26 microsatellite loci identified K = 2 as the most likely number of clusters (ΔK = 20,533.24) assigning the accessions of Mesoamerican origin (phaseolin type I; “S”) to cluster A, while the accessions of Andean origin (phaseolin type II or III) formed the cluster B. At K = 3 (ΔK = 1,935.93), cluster B defined for K = 2 split up into two clusters separating the great majority of accessions of phaseolin type II (“H” or “C”) from those having phaseolin type III (“T”). At K = 3, 48 accessions (27.59%) belonged to cluster A, 29 (16.67%) to cluster B₁, and 80 (45.96%) to cluster B₂. For 17 accessions (9.77%), membership probabilities were lower than 75% in any of the clusters and were thus considered as “mixed origin.” Phaseolin type and cluster membership of each accession are given in Supplementary Table 1. The Q-values of each accession obtained at K = 3 were used for the control of genetic background in GWAS.

Out of 17,514 polymorphic markers 8,092 (46%) had high scoring reproducibility (>0.95), high call-rate (>0.90) and minor allele frequency (MAF) higher than 5%. From 8,092 SNP sequences, 6,599 (82%) high-quality SNPs were aligned to 11 chromosomes of common bean. The average number of SNPs per chromosome was 599.91, ranging from 403 on chromosome 4 to 834 on chromosome 2. The mean number of SNPs per Mbp was 12.85 or, on average, one SNP every 77,828 base pairs. Two hundred and eighty-eight SNPs for which more than 5% of accessions were heterozygous were removed for further analysis, and missing data were imputed for the remaining 6,311 SNPs (Supplementary Table 2).

Linkage Disequilibrium

Linkage disequilibrium, non-independence of alleles at different loci was assessed as the squared correlation between loci (r²). Bias caused by relatedness and/or population structure was removed by adjusting r²: (a) using kinship estimates (r_v²), (b) using Q-values obtained by STRUCTURE (r_s²), or (c) using both (r_vs²). Fitted model for non-adjusted estimate r² (Figure 1A) showed that LD decay was barely visible within 0–10 Mbp distance range, remaining above 0.3 even for pairs of loci at the opposite ends of a chromosome. On the contrary, when adjusted for population structure, LD decayed to 0.1 at an approximate distance of 4 Mbp. The r² value of 0.1 could then be taken as an arbitrary threshold for comparison of different measures. Bias caused by relatedness was even stronger, and there was almost no difference between adjustment for kinship only and for kinship and population structure, both reaching LD decay threshold (r² = 0.1) at approximately 1 Mbp. Both measures, r_v² and r_vs², shown on Figure 1A were based on a centered identity-by-state (IBS) kinship matrix. The r_vs² based on centered IBS was also shown in Figure 1B, in which it was compared to r_vs² values based on four other kinship matrices. Estimation curves for normalized IBS and distance-based kinships were completely overlapped, and the difference between them in comparison to centralized IBS was visible only due to magnification achieved by reducing both axes to approximately 1/10 of their total range. Two dominance-based kinship curves were completely overlapped as well, but characterized with lower decay rate, reaching 0.1 threshold at the distance of approximately 3 Mbp. The results suggested that adjustment for relatedness preferably using a centralized IBS kinship matrix was a necessary requirement for GWAS.

FIGURE 1

Figure 1. Linkage disequilibrium (LD) decay as a function of distance between SNPs within a chromosome: (A) comparison of four r² measures – r_v² and r_vs² are based on centered IBS kinship: (1) r² = unadj, (2) r_s² = stru, (3) r_v² = kin, (4) r_vs² = kin + stru); (B) comparison of r_vs² based on five different kinship matrices: (1) dominance normalized, (2) dominance centered, (3) distance, (4) normalized, (5) centered.

Genome-Wide Association Study (GWAS)

Before carrying out the GWAS, missing phenotypic data were imputed using a centered IBS kinship matrix. Prior to imputation, outliers (>1.96 standard deviations) were trimmed, thus removing from one to ten the most extreme data points per trait. Descriptive statistics for all minerals based on imputed data set are given in Table 1.

TABLE 1

Table 1. Descriptive statistics for seed mineral content in 174 Croatian common bean accessions.

The largest number of significant SNPs considered as QTN was discovered for N (Figure 2A); in total there were 22 significant QTNs on seven chromosomes. Among them, the highest observed -log₁₀(p) peaks were observed at two pairs of QTNs located on chromosome 3 (Pv03) and chromosome 10 (Pv10), both explaining 7% of total phenotypic variation. Five QTNs were associated with P: four on chromosome 7 (Pv07) and one on chromosome 8 (Pv08) (Figure 2B), explaining 8–9% of variation. A single significant QTN was found for Ca on chromosome 9 (Pv09) (Figure 2C) and Mg on Pv08 (Figure 2D), explaining 9 and 13% of variation, respectively. Finally, two QTNs associated with Zn were located 1.4 Mbp apart from each other on Pv06 (Figure 2E), explaining 8 and 10% of variation. No QTNs were found to be associated with K, Fe, and Mn.

FIGURE 2

Figure 2. Manhattan plots for significant markers detected by TASSEL: (A) N; (B) P; (C) Ca; (D) Mg; (E) Zn.

As expected, a multi-locus model fitting in MLMM resulted in much less marker-trait association discoveries. Out of 22 QTNs associated with N by TASSEL, only two were confirmed by MLMM: one out of four on Pv01 and the first of the two QTNs on Pv10. Similarly, only one QTN out of four found by TASSEL on Pv07 was associated with P. An additional discovery by MLMM is QTN associated with N on the Pv05, not previously detected by TASSEL.

Regarding the relationships between sizes of different variance components estimates obtained by MLMM, comparison of the residual sum of squares (RSS) plots for N and P (Figure 3) could be summarized in two key points: (1) population structure explained 40% of total N variation and 0% of total P variation; (2) error variation was of similar size as genetic variation in N and twice as large in P. Consequentially, despite similar relative size of genetic variation for N and P, MLMM detected three QTNs with p-values below the threshold for N, and only one for P.

FIGURE 3

Figure 3. Variability breakdown at different MLMM steps for (A) N; (B) P. Structure (blue), SNPs (green), kinship (red), and error (white). Solid vertical line designates optimal step.

The summary of all marker-trait associations detected either by TASSEL or MLMM is given in Table 2. The highest overall explanatory power was recorded for the QTN Mg_8, explaining 13% of the total phenotypic variability for Mg. Instead of an individual R² value for each marker, only a cumulative value for the full set of markers could be extracted from MLMM output. The associated markers were distributed over the whole genome, except for the chromosomes Pv04 and Pv11; N was the trait with the largest number of discovered associations, but individual effects of markers were lower than for other traits. When the TASSEL discovered a sequence of QTNs within up to 0.3 Mbp distance range, MLMM would likely retain just one of them. Finally, most of the markers were positioned closer to the chromosome ends, and just a few closer to the centromeric region.

TABLE 2

Table 2. Positions of the quantitative trait nucleotides (QTNs) associated with seed mineral content in common been.

Strong effect of population structure in N, clearly visible on Figure 3, deserve to be further elucidated in conjunction with the effect of allele substitution at QTN sites. Figure 4 shows N content distribution in different allele classes for QTNs N_1.4 (a-b), N_3.1 (c-d), and N_5.1 (e-f), across whole population (a, c, e) and within subpopulations (b, d, f). Reference allele for all QTNs was always present in all subpopulations and mean N content of individuals carrying reference allele in the subpopulation A (Mesoamerican origin) is always somewhere in between means of the subpopulations B1 and B2 (Andean origin). There are the three possible scenarios for the distribution of the SNP allele. It could be present only in subpopulations of Andean origin (Figure 4B), but its positive effect observable in both B1 and B2 almost disappeared at whole population level, being masked by the effect of population structure (Figure 4A). In the second scenario, SNP allele is present only in subpopulation A (Mesoamerican) expressing a clear negative effect (Figure 4D), that gets shrunken by the effect of the population structure (Figure 4C). Finally, if the SNP allele is present in all subpopulations, its effect varies from one subpopulation to another (Figure 4F), to became almost invisible at the level of whole population (Figure 4E).

FIGURE 4

Figure 4. N seed content distribution for different allele classes across the whole population (left) and within subpopulations (right; A Mesoamerican; B1 Andean; B2 Andean) for: (A,B) N_1.4; (C,D) N_3.1; (E,F) N_5.1. Diamonds designate population/subpopulation means for reference allele homozygotes (gray), SNP homozygotes (yellow), and heterozygotes (blue).

Discussion

Genetic Structure and Seed Mineral Content in Common Bean

Concerning the origin of analyzed common bean accessions, we performed a model-based cluster analysis based on microsatellite markers which revealed the presence of three clusters in nearly complete congruence with the results of phaseolin type analysis. The results are consistent with previous studies showing that the European germplasm originates also from Mesoamerican and Andean gene pools, where the Andean is more prevalent (Bellucci et al., 2014). Similar results were obtained for Portuguese germplasm (Leitão et al., 2017) and the germplasm originating from countries neighboring Croatia (Bosnia and Herzegovina: 60% of the accessions of Andean origin; Serbia: 63%; Slovenia: 67%) (Maras et al., 2015).

As one of the requirements for successful GWAS, the presence of a reasonable amount of phenotypic diversity amongst genotypes in the study panel was already established in an earlier study (Palèić et al., 2018). For most of the analyzed traits, the amount of present phenotypic variability was either comparable or slightly narrower (especially if they were comprised of the accessions from regions located close to the center of origin) than variability reported for other collections. Furthermore, Croatian accessions of Mesoamerican origin had superior seed mineral content compared to those of Andean origin which was in congruence with the research of Ribeiro et al. (2012), except for iron, which is congruent to results reported by Beebe et al. (2000) and Islam et al. (2002).

Linkage Disequilibrium

The major issue in GWAS is to separate the true signal of marker-gene association from a plethora of false signals created by population structure and relatedness of individuals. The impact of kinship and population structure on LD estimates in common beans has been extensively discussed in a recent study by Diniz et al. (2019) that served as a model for designing the methodology of the present study. Despite all the differences in genetic composition and origin of studied populations (Croatian landraces vs. composite panel consisting of commercial cultivars, breeding lines, recombinant inbred lines and landraces) LD estimates from the present study have all the hallmarks of results reported by Diniz et al. (2019). The extent of bias caused by kinship and structure on LD estimates is quite similar: both studies reveal that more bias is introduced by kinship than by structure as there is only a negligible difference between r_v² and r_vs², and finally, as estimated by r_vs², LD decayed to 0.1 at a distance of approximately 1 Mbp. In addition to the conclusions of Diniz et al. (2019), the present study revealed that there is no essential difference between distance-based and centered or normalized IBS kinship estimate, as well as that all three of them outperformed both dominance-based kinship estimates. Almost identical performance of r_v² and r_vs² measures support Astle and Balding (2009) opinion that adjustment for kinship already contains the adjustment for population structure as well. Several authors have tried to resolve this issue through different modifications of the kinship matrix in order to remove information already contained in the model as fixed effects of population structure. So far, there is no consensus regards this matter, and as Gianola et al. (2016) concluded, “it is impossible to answer unambiguously the question of which approach is best.”

GWAS Methodology

There are two possible reasons for the discrepancy in the number of marker-trait associations identified by TASSEL and MLMM: (1) stepwise reduction of available genetic variability in MLMM; (2) the use of different methods for estimation of genetic and error variance.

Stepwise reduction of available genetic variability in MLMM: Single locus model used in TASSEL explores the entire genetic variability for fitting each SNP, while the multilocus model (MLMM) reduces available genetic variability for the next step by fixing selected SNP at each fitting step. At the final forward step, the genetic variability is completely exhausted; MLMM stops and runs the backward part of the stepwise fitting, as illustrated by RSS plots in Figure 3. The largest potential number of discoveries is equal to the number of steps, but the number of actual discoveries is equal to the ordinal number of the optimal step. Search for the optimal step is based on the selected threshold value, and it is aimed at finding the last (i.e., first) step in which all p-values for fixed markers are below the threshold.

The use of different methods for estimation of genetic and error variance in TASSEL and MLMM: Stepwise reduction of available genetic variability in MLMM: it is not possible to make a straightforward comparison of variance estimates from TASSEL and MLMM, because MLMM output provides only relative values of genetic and error variance, used to create RSS plots on Figure 3. At each step MLMM reports the value of “pseudo-heritability,” the ratio between genetic and sum of genetic and error variance. E.g., this estimate at step 0 for N is equal to 0.548, thus substantially larger than 0.377, the value of equivalent ratio calculated using the TASSEL null model variance estimates.

Furthermore, as the results of the TASSEL analysis were obtained by using the option to re-estimate variance estimates for each marker fit, the analysis was redone using the timesaving P3D (“population parameters previously determined”) option. P3D approach uses null model estimates for all marker fits (Zhang et al., 2010). Despite the perfect correlation between the p values obtained by the two options (“re-estimate” vs. P3D), the results were in disagreement in terms of the number of discoveries. Namely, using the selected threshold of q = 0.2, the analysis using P3D option detected no significant SNPs, because the smallest estimated q-value was as high as 0.25. It is likely that this outcome is related to panel size as well as the number of markers.

Application of stringent methods for multiple testing adjustment in the present study would result in no potential QTN finding, thus turning all of them into false negatives. Using a more relaxed approach by selecting Storey’s q-value of 0.2 as the cutoff point yielded the reported set of QTNs. It comes with the cost of 20% false positives, i.e., when the analysis of N content in TASSEL detected 22 marker-trait associations, 5 of them were actually false. Although other common bean studies used lower FDR cutoff values of 0.01–0.10 (Hoyos-Villegas et al., 2017; Ates et al., 2018; Katuuramu et al., 2018), it is not unusual to find recently published GWAS analyses in some other crops with q-value threshold of 0.2 (e.g., Muqaddasi et al., 2017; Novakazi et al., 2019). As Storey and Tibshirani (2003) have pointed out: “because significant features will likely undergo some subsequent biological verification, a q-value threshold can be phrased in practical terms as the proportion of significant features that turn out to be false leads”; significant QTNs can be treated as merely input values for further evaluation such as functional annotation.

QTN Discoveries

There are only few published GWAS analyses of nutrient content in common bean seed (that we are aware of) Two QTNs associated with seed iron content found on chromosome 6 by Diaz et al. (2020) are different from three QTNs associated with bioavailable iron found on the same chromosome by Katuuramu et al. (2018), along with two others on chromosomes 7 and 11. Zinc QTNs were found on chromosomes 6 (present study), 7 (Katuuramu et al., 2018), and 8 (Diaz et al., 2020); manganese QTNs on chromosomes 2, 3, 5, 8, and 11 (Erdogmus et al., 2020). The most abundant were calcium and nitrogen QTNs: for Ca they were found on chromosomes 1, 2, 3, 4, 8, 9, 10, and 11 (Katuuramu et al., 2018; Erdogmus et al., 2020, present study); for N on chromosomes 1, 2, 3, 5, 7, 8, 9, and 10 (Kamfwa et al., 2015a; present study). In addition to this, five phosphorus and a single magnesium QTN were discovered in the present study. According to the available information for their positions, there are no common QTNs detected in more than one study. It could be added, more as a curiosity, that QTN for nitrogen derived from atmosphere detected on chromosome 7 (at 4,048 kbp) by Kamfwa et al. (2015a) is positioned between the last two in series of four phosphorus QTNs stretching between 3,864 and 4,076 kbp on the same chromosome from the present study. The explanatory power of QTNs from all studies is relatively low; there are just a few of them that could explain more than 10% of total phenotypic variability. The single exception is the study of Mahajan et al. (2017), who reported much higher explanatory power of the discovered QTLs, but is not comparable with the others, because it was based on different type of markers (SSRs).

Classical QTL analyses also yielded numerous marker-trait associations, which can also be used for comparison with GWAS findings, knowing the approximate positions of the QTLs. This is at least possible for meta-QTLs, thanks to their physical positions provided by Izquierdo et al. (2018). Three iron and zinc QTNs found by Katuuramu et al. (2018) fall into meta-QTL intervals on chromosomes 6 and 7, but for none of them there seem to be any candidate genes (reported by either group of authors). The explanatory power of meta-QTLs is stronger than for QTNs; they explain from 10 to 27% of total phenotypic variation probably because the mapping populations are derived from crosses between two homozygous parents.

The important aggravating factor for the detection and use of marker-trait associations is the fact that a lot of them are either environment or population specific. In studies that collected phenotypic data from more than one environment, most of the QTNs were environment-specific; for example, two iron QTNs detected in two different seasons by Diaz et al. (2020), or 3/7 (Ca) and 2/10 (Mn) QTNs detected in both years on both locations by Erdogmus et al. (2020). European collections of landraces (such as this Croatian collection) usually represent a mixture of genotypes of Andean and Mesoamerican origin and some inter-genepool hybrids. The strong effect of population structure can alter the effect of allele substitution to such proportion that it can be completely hidden at the whole population level, which can somehow be related to Blair and Izquierdo (2012) conclusion that QTLs can be genepool specific and therefore not detectable in inter-genepool crosses.

Biofortification

Among the various possible strategies for use of QTNs in the biofortification breeding programs, Izquierdo et al. (2018) consider gene pyramidizing through marker-assisted selection too challenging, illustrating it by the example of stacking eight meta-QTL regions associated with both Fe and Zn in a single breeding line that has a probability of one in 256. They, therefore, suggest the genomic selection as the most promising strategy. Gains that could be achieved by allele substitution in the present study are smaller or larger than gains reported by other authors, depending on the mineral. Zn gains are larger than in Diaz et al. (2020) who reported a gain of 0.85 ppm, or Cichy et al. (2009) who reported gains of 0.6–1.5 ppm, and closer to Blair et al. (2009) who reported gains of 1.02–2.53 ppm. Ca and Mg gains are smaller than in Casañas et al. (2013) who reported gains of 0.85–11.40 g kg^–1 for Ca and 0.33–0.40 g kg^–1 for Mg, but for the dry weight of seed coat (in contrast to whole seed in present study). Blair et al. (2013) discuss the differences in concentrations of nutrients in seed coat and cotyledon, and conclude that they are due to different genes involved in mineral accumulation, as well as that through domestication accumulation of some nutrients shifted from seed coat to cotyledons. The achieved gains will not be fully utilized in human consumption, due to the presence of anti-nutrients, as well as due to losses during storage, processing and cooking. Besides the expected diminishing effect on the total content of nutrients, pre-soaking and cooking can as well increase the amount of bioavailable nutrients, due to a parallel decrease of the content of anti-nutrients (Fernandes et al., 2010). This is confirmed by Katuuramu et al. (2018), who analyzed mineral content of cooked beans and concluded that the results are in agreement with the studies on raw beans. Initial studies with already developed biofortified cultivars with twofold increased iron content (approximately from 50 to 100 ppm) showed only moderate increase in absorbed iron quantity, due to presence of higher levels of phytate (Petry et al., 2014; Tako et al., 2015). However, the consumption of biofortified beans can still significantly improve the iron status, as demonstrated in feeding trial (Haas et al., 2016). An alternative strategy for increasing the bioavailability of nutrients can be therefore the breeding for decreased anti-nutrient content, but just for areas with prevalent micronutrient deficiencies, while in others increased amounts of anti-nutrients like phytate might have beneficial effects on human health, reducing the risk of cancer and obesity (Blair et al., 2012).

Data Availability Statement

The SNP dataset generated for this study is included in the Supplementary Table 2.

Author Contributions

ZŠ and KC-S: conceptualization. JG and ZL: methodology. KC-S, BL, MV, MP, and ZL: investigation. JG and KC-S: writing-original draft preparation. ZŠ and ZL: writing – review and editing. All authors read and approved the final manuscript.

Funding

This research was supported by the project KK.01.1.1.01.0005 Biodiversity and Molecular Plant Breeding, Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE Cro-PBioDiv), Zagreb, Croatia and by Croatian Science Foundation under the project UIP-11-2013-3290.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank the reviewers and the editor for their thoughtful comments and efforts towards improving our manuscript.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2021.636484/full#supplementary-material

Supplementary Figure 1 | Genetic structure of Croatian common bean accessions as estimated using Structure (Pritchard et al., 2000) at K = 2 and K = 3. Accessions are grouped according to phaseolin types (I, II, III). Each accession is represented by a column, and the color corresponds to the membership probability (i.e., Q-value) of the individual belonging to a particular cluster.

Supplementary Table 1 | Common bean accession used in this study.

Supplementary Table 2 | Information on SNP markers used in the study.

Abbreviations

ALK, anaplastic lymphoma kinase; FDR, false discovery rate; GWAS, genome-wide association study; IBS, identity-by-state; LD, linkage disequilibrium; MAF, minor allele frequency; MLMM, multilocus mixed model; P3D, “population parameters previously determined”; SNP, single nucleotide polymorphism; QTL, quantitative trait locus; QTN, quantitative trait nucleotide.

Footnotes

^ https://www.diversityarrays.com/

References

Astle, W., and Balding, D. J. (2009). Population structure and cryptic relatedness in genetic association studies. Stat. Sci. 24, 451–471. doi: 10.1214/09-STS307