Association Mapping for Fiber-Related Traits and Digestibility in Alfalfa (Medicago sativa)

Association mapping is a powerful approach for exploring the molecular genetic basis of complex quantitative traits. An alfalfa (Medicago sativa) association panel comprised of 336 genotypes from 75 alfalfa accessions represented by four to eight genotypes for each accession. Each genotype was genotyped using 85 simple sequence repeat (SSR) markers and phenotyped for five fiber-related traits in four different environments. A model-based structure analysis was used to group all genotypes into two groups. Most of the genotypes have a low relative kinship (<0.3), suggesting population stratification not be an issue for association analysis. Generally, the Q + K model exhibited the best performance to eliminate the false associated positives. In total, 124 marker-trait associations were predicted (p < 0.005). Among these, eight associations were predicted in two environments repeatedly and 20 markers were predicted to be associated with multiple traits. These trait-associated markers will greatly help marker-assisted breeding programs to improve fiber-related quality traits in alfalfa.


INTRODUCTION
Alfalfa (Medicago sativa) is one of the most important forage crops in the world due to its high biomass and choice nutritional profiles and it provides reliable sources of protein and minerals to animals. The main objective, however, in many alfalfa quality breeding programs is to improve the digestibility (Buxton and Redfearn, 1997) for poor stem digestibility would cause major loss in animal feeding values (Mowat et al., 1965). A research indicated minor improvement in alfalfa stem digestibility would impact agriculture economically (Jung and Allen, 1995). In these sense, efforts in traditional breeding of improving the quality traits as well as yield, resistance, and agronomic traits are necessary. The feeding quality traits are usually quantitative, i.e., controlled by multiple genes. Understanding the genetic architecture of these traits on the molecular level is of necessity for efficient molecular breeding.
Currently, linkage analysis via QTL mapping, genome-wide association mapping (GWAS), and joint-mapping by combining linkage and association analysis (Ed Buckler's NAM population in maize) are the three main methods for dissecting complex quantitative traits. Compared with the traditional linkage analysis based on mapping populations, association mapping, has been proposed as an alternative powerful tool to overcome limitations of pedigree based QTL mapping for it has higher mapping resolution, reduced research time, and greater allele number (Zhu et al., 2008). By utilizing historical recombinations that break LDs (linkage disequilibrium), association mapping has been widely adopted for almost all major crop species for gene identification and QTL validation, as well as better understanding of the genetic basis of complex traits (Gupta et al., 2011;Jiang et al., 2014;Wei et al., 2014;Font i Forcada et al., 2015;Portis et al., 2015). Given the facts that the most forage plants have a short selection and breeding history, Li et al. (2011) performed a GWAS analysis to map the yield and stem composition in an alfalfa breeding population of 190 individuals based on 71 SSR markers, and identified only one SSR strongly associated with acid detergent fiber (ADF) and acid detergent lignin (ADL), respectively. In this study, the association analysis was conducted for five fiber-related traits using a panel of 336 alfalfa genotypes, partially derived from the alfalfa core collection set developed by Basigalup et al. (1995) plus some germplasm from China. The study aimed to identify desirable alleles which could show significant trait-marker associations for the improvement of digestibility in alfalfa.

Plant Materials and Experimental Design
A total of 336 cultivated tetraploid alfalfa genotypes from 75 M. sativa subsp. sativa accessions were selected to construct the association mapping population (Table S1). Each accession was represented by four genotypes, except for the Chinese accessions represented by eight genotypes for each accession. Nine Chinese accessions were collected from National Herbage Germplasm Bank of China; two accessions from Syria, one from Libya, and one accession from Sudan provided by the Institute of Animal Science, Chinese Academy of Agricultural Science (Beijing, China); the rest 62 accessions were provided by USDA National Plant Germplasm System (NPGS). All genotypes were grown and clonally propagated. Field experiments were conducted at the experimental station at the Institute of Dry Farming, Hebei Academy of Agriculture, and Forestry Sciences (Hengshui, Hebei Province, 37 • 44 ′ N; 115 • 42 ′ E) in May 2012, and at the experimental station of Institute of Animal Science of CAAS (Changping, Beijing, 40 • 10 ′ N; 116 • 13 ′ E) in May 2014. The experiment at each location used a randomized completed block design with two replications, each of which contained six clones. Within each plot, each cloned plant was spaced by 30 cm and each row was spaced by 75 cm. The biomass above the ground were trimmed ∼1 month after establishment and then regrew for the remainder of the season.

Genotyping
Eighty-five polymorphisms SSR primers (Eujayl et al., 2004;Robins et al., 2007) were used for genotyping. DNA extraction, PCR amplification, electrophoresis, and SSR genotyping analysis were conducted according to the methods described by Qiang et al. (2015).

Data Analyses
Analysis of variance (ANOVA) of all phenotypic data based on the means of traits of each accession under four environments was conducted as model: Phen = genotypes + environments + e. where Phen as the phenotypic observation, genotypes as the genetic effect, environments as the effect of the four environments, and e as the residual. All analyses were performed using SAS 8.02 (SAS Institute, 1999). Broad-sense heritability (H 2 ) was calculated as the genotypic variance divided by the total variance.
A kinship matrix was calculated using SPAGeDi software (Hardy and Vekemans, 2002).
The association between the phenotypes and markers was performed using Tassel v2.1 software (Bradbury et al., 2007). Three models were tested, namely the simple general linear model (GLM, Naive-model), the structured association model (GLM, Q model), and the mixed linear model (MLM, Q + K model) . The marker-trait association was considered as significant using a threshold of P < 0.005.

Phenotypic Variation
The descriptive parameters of the five measured traits under four environments were shown in Table 1. In summary, the ADF ranged from 20.59 to 42.85%, with an average of 31.22-33.63%; the NDF ranged from 25.08 to 53.06% with a mean of 36.72-42.89%; the NDFD 30 h alternated from 12.28 to 26.75% with an average of 16.68-23.81%; the NDFD 48 h ranged from 11.02 to 22.4% with an average of 15.47-17.55%; the ADL varied from 2.70 to 8.66% with an average of 4.26-6.56%. All the datasets showed a normal or nearly normal distribution ( Figure S1). The broad-sense heritability of most traits was relatively high, ranging from 63% for NDF 30 h to 76% for NDF 48 h (Table 1), except for the heritability of ADL (45.1%), indicating the majority of studied traits were dominated by the genetic factors rather than the environmental variations. All five traits were significantly influenced by genotypes, environments and genotype × environments interactions ( Table 1).

Population Structure and Relative Kinship
The genetic relationships among the genotypes were investigated using a model-based Bayesian clustering method on the 85 SSR marker genotyping data. Two populations were identified by STRUCTURE software using a Bayesian approach, corresponding to China, and the rest of the world as indicated by Qiang et al. (2015). The kinship was estimated based on the 85 SSR data on 336 alfalfa genotypes. About 51.8% of the pairwise kinship estimates were equal to 0, while 99.7% of the relative kinship estimates were <0.2 in this alfalfa panel (Figure 1). These results indicated that most accessions have no or weak kinship with the other accessions in the panel, which might be due to the broad range collection of genotypes.

Association Analysis
For all five fiber-related traits, association analyses were conducted to assess the performance of three different models ( Table 2 and Figure 2). Generally, the observed P-value from GLM greatly deviated from the expected P-value, followed by the Q model, while the P-value from the Q + K model was close to the expected P-value (Table 2 and Figure 2). The result indicated that the false positives were well controlled in the MLM model in the study. Therefore, subsequent analyses were done based on the MLM model. Using the Q + K model, a total of 124 significant markertrait associations was predicted under at least one environment (Table S2). For ADF trait associations, six, one, four, and 12 alleles were predicted as significant in 13HS, 14CP, 14HS, and 15HS data sets, respectively, with the explained phenotypic variance (R 2 ) ranging from 2.48 to 8.13% (Table S2). For ADL, five, one, seven, and six associated alleles were identified in four environments, respectively, with the R 2 varied from 2.56 to 9.66%. For NDF, six, one, four, and nine significant associated alleles were identified in four environments, respectively, with the R 2 from 2.76 to 8.28%. For NDF 30 h, eight, two, six, and eight significant associated alleles were identified in four  environments, respectively, with the R 2 from 2.52 to 6.98%. For NDF 48 h, eight, eight, three, and 10 significant associated alleles were detected in four environments, respectively, with the R 2 from 2.63 to 9.32%. Among these associated alleles, eight alleles were repeatedly observed in two environments ( Table 3). For example, allele m13_173 associated with ADF was detected both in 14HS and 15HS. The allele m561-216 was associated with ADL both in 13HS and 14HS. In addition, among these associated alleles, 20 alleles were commonly associated with multiple fiber-related traits (Table S2). For example, the allele m561_216 was associated with ADF, ADL, NDF, NDF30, and NDF 48 h. The allele effect derived from significant marker-trait association was shown in Table S2. Among the markers associated with ADF, M115_183 had the most positive phenotypic effect (8.79), whereas m2_142 had the most negative phenotypic effect (−3.54). The alleles m215_182 and m2_142 had the most positive (9.95) and most negative (−4.11) phenotypic effect associated with NDF, respectively. For the NDFD 30 h, m190_205 had the most positive phenotypic  (Table S2). For m2, the individuals carrying the allele 136 bp had a lower NDFD48h than those carrying alleles 140 bp (Table S2). For m225, the individuals carrying the allele 191 bp had a lower NDFD48h than those carrying alleles 203 bp (Table S2). For m53, the individuals carrying the allele 176 bp had a lower ADL than those carrying alleles 131 bp (Table S2).

DISCUSSION
Association mapping has increasingly become a viable approach for the genetic dissection of quantitative traits. Due to the diverse geographical origins, the germplasm panel may contain either population structure or familial relatedness (Yu and Buckler, 2006). One of the limitations of association mapping studies is the easy detection of false positives associations caused by the existence of the genetic structure in the populations studied (Flint-Garcia et al., 2005). Several researches reported, in the structured association population, the mixed model (Q + K) showed a significant improvement in goodness of fit for traits (Flint-Garcia et al., 2005;. In this panel, association analysis was conducted for five fiber-related traits in four environments using the GLM−simple model, the Q model, and the Q + K model. The alfalfa association populations used in this study contained population structure but no obvious familial relationships (Figure 1). The quantile-quantile (QQ) plots indicated that the Q + K model performed best for all five fiber-traits, It seems that the Q + K model was sufficient to minimize false-positive associations, especially for some traits not influenced by population structure, which was consistent with other model simulations and comparisons Zhu et al., 2008). All the results indicate that model testing for quantitative traits is necessary for increasing the accuracy of association.
There was no previous study on alfalfa fiber trait mapping or association using molecular markers. A total of 124 alleles from 38 markers accounted for phenotypic variation with 2.46-9.66% were identified as associated with five fiber-related traits based on association analysis using the Q + K model (Table S2). These associated alleles were not consistent with the previous studies of Li et al. (2011) which may be explained by the different markers and different population used in the two studies. These was also observed in previous studies on linkage mapping and association mapping which found that different mapping populations detected different QTL regions (Agrama et al., 2007;Zhang et al., 2014).
Most of the loci that were associated with the five traits could only be identified in a specific environment, indicating that the fiber-related traits in the study are variously influenced by the environment. However, some stable associations were identified in our study, such as the allele m350_342 which located in chromosome 1 were repeatedly detected in two environments and associated with ADF, NDF, NDFN30h, and NDFD48h. Three markers, m115_183, m520_134, and m520_137, located in chromosome 7 were repeatedly detected in two environments and associated with NDFD48h. Markers with significant traits associated over multiple environments may indicate that the associated genes are more stably expressed (i.e., less environmental influence) (Ray et al., 2015). A low threshold, P < 0.005, was used to detect the marker-trait association due to the limited number of marker used in this study. If high-density DNA polymorphism datasets are used for association mapping, additional markers with high -Log (P-value) may be obtained. Among these associated alleles, different distributed patterns were observed among eight chromosomes in alfalfa. Eight alleles from seven markers which associated all five traits were observed in the chromosome 1, while only one allele of one marker which associated two traits was observed in the chromosome 6 ( Table S2). In the study, 20 markers were associated with more than one traits indicated these traits were correlative each other. Interestingly, the markers, m225, and m338, reportedly associated with yield (Li et al., 2011), was found associated with NDFD 30h and NDFD48h in this study, suggesting a correlation between these traits as assessed by the SSR or these traits are controlled by the same or neighboring regions in the genome. The explained phenotypic variance of all associated alleles ranged from 2.46 to 9.66%, with mean of 3.84%. The result indicated that the fiber-related traits were complex in nature, i.e., controlled by multiple genes without obvious major effects.
The present study is the first attempt in associating alfalfa fiber-related traits with the genotyping results derived from SSR markers using a diverse set of global collection of alfalfa genotypes. Our results demonstrated that this alfalfa panel is suitable for association mapping analysis targeting complex quantitative traits with optimal association models. The markers associated to the QTLs in the study can be effectively used in further alfalfa marker assisted breeding programmers for introgression of alleles into locally well adapted germplasm.

AUTHOR CONTRIBUTIONS
ZW designed the experiments performed the statistical analysis and drafted the manuscript. HQ performed SSR genotyping. HZ, GL, ZZ, RX, and YZ conducted the quality analysis. XW and HG revised manuscript. All authors have read and approved the final manuscript.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2016. 00331  Figure S1 | Histograms showing frequency distribution of five fiber-related traits in different environments in the study. The y-axis denotes the value of frequency, whereas the x-axis shows resultant groups of genotypes.