Genome-Wide Association Studies for Five Forage Quality-Related Traits in Sorghum (Sorghum bicolor L.)

Understanding the genetic function of the forage quality-related traits, including crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), hemicellulose (HC), and cellulose (CL) contents, is essential for the identification of forage quality genes and selection of effective molecular markers in sorghum. In this study, we genotyped 245 sorghum accessions by 85,585 single-nucleotide polymorphisms (SNPs) and obtained the phenotypic data from four environments. The SNPs and phenotypic data were applied to multi-locus genome-wide association studies (GWAS) with the mrMLM software. A total of 42 SNPs were identified to be associated with the five forage quality-related traits. Moreover, three and two quantitative trait nucleotides (QTNs) were simultaneously detected among them by three and two multi-locus methods, respectively. One QTN on chromosome 5 was found to be associated simultaneously with CP, NDF, and ADF. Furthermore, 3, 2, 2, 5, and 2 candidate genes were identified to be responsible for CP, NDF, ADF, HC, and CL contents, respectively. These results provided insightful information of the forage quality-related traits and would facilitate the genetic improvement of sorghum forage quality in the future.


INTRODUCTION
Sorghum (Sorghum bicolor L.) is a popular crop worldwide, which is used a food source, animal fodder, and raw material for alcoholic beverages and biofuels in industries (Paterson et al., 2009). Most of the important agronomic traits are genetically controlled by quantitative trait loci (QTLs) (Zou et al., 2012;Boyles et al., 2017). For example, the forage quality is an important quantitative trait. Thus, understanding their genetic mechanism is essential for identifying the candidate genes and selecting effective molecular markers in sorghum breeding.
The forage digestibility and crude protein (CP) content are the main focus for forage sorghum breeding (Murray et al., 2008). Forage digestibility is mainly determined by the cellulose (CL), hemicellulose (HC), and lignin contents , which are important components of the neutral detergent fiber (NDF). On the other hand, acid detergent fiber (ADF) is a portion of sorghum fiber and is obtained from acid detergent-treated forage. The two types of fibers, NDF and ADF, are the two vital components of forage digestibility. Recently, the forage quality traits have been studied in sorghum and some related QTLs have been identified (Murray et al., 2008;Shiringani and Friedt, 2011;Li et al., 2015). However, these identified QTLs were observed to be less sensitive due to the limitation of linkage analysis based on bi-parental mapping populations.
Compared with the linkage analysis of bi-parental mapping populations, genome-wide association studies (GWAS), which is based on linkage disequilibrium (LD) and provided sufficient genetic background information, have become a powerful alternative for the investigation of quantitative traits. There are three main strategies for GWAS. Firstly, a generalized linear model (GLM) was proposed for the genetic analysis of the quantitative traits (Price et al., 2006), but it did not effectively control the polygenic background. Secondly, a mixed linear model (MLM) was elaborated to take into account the population structure and polygenic background using the pedigree relationship or marker information (Zhang et al., 2005;Yu et al., 2006). These methods involve a large calculation burden due to the tremendous number of existing markers. Therefore, a series of rapid detection methods were finally developed, such as EMMA (Kang et al., 2008), FaST-LMM (Lippert et al., 2011), GRAMMAR-Gamma (Svishcheva et al., 2012), ECMLM (Li et al., 2014), SUPER (Wang et al., 2014), BOLT-LMM (Loh et al., 2015), and FarmCPU . Although the above methods have been widely adopted, the complex traits controlled by multiple QTNs could not be effectively identified. To address this issue, Zhang's group has developed a series of multi-locus GWAS methods, including mrMLM (Wang S. B. et al., 2016), FASTmrMLM , FASTmrEMMA , ISIS EM-BLASSO , pLARmEB , and pKWmEB (Ren et al., 2018).
In our study, we utilized the advantageous multi-locus GWAS to investigate the sorghum forage quality-related traits. We genotyped 245 sorghum accessions by using 85,585 singlenucleotide polymorphisms (SNPs) and phenotyped them in the four environments. The data were analyzed by the multi-locus GWAS software, mrMLM.

Plant Materials
The 245 sorghum accessions (Table S1) included 238 mini-core collection sorghum and 7 breeding varieties. These accessions were planted in the Fengyang campus of Anhui Science and Technology University (Fengyang, China, 32 • 52 ′ N, 177 • 33 ′ E) and Tengqiao town of Hainan Province (Tengqiao, China, 18 • 24 ′ N, 109 • 45 ′ E) in 2015 and 2016. All the experiments in the four environments used a completely randomized block design with three replicates. The aboveground parts were harvested when 70% accessions were at the heading stage. The harvested plants were dried at 75 • C for three days. The plant material was then milled using a grinder and filtered using a 0.5 mm sieve.

Phenotypic Trait Evaluation and Data Analysis
Seven hundred and thirty-five sorghum samples (3 replicates) were measured for CP, CL, HC, NDF, and ADF using the traditional chemical methods, and simultaneously scanned for near-infrared (NIR) spectra with an Antaris TM II FT-NIR Analyzer (Thermo, USA). A model was established using TQ Analyst software based on the NIR spectra and the results of the chemical analysis. The samples were then scanned for NIR spectra, and their CP, CL, HC, NDF, and ADF were calculated using the model. The mean of the phenotypic data and the correlation coefficients were calculated using Microsoft Excel.

DNA Extraction and RAD Sequencing
Total DNA was extracted using the DNAsecure Plant Kit (Qiagen, Cat.No. DP320). All the samples were standardized to 50 ng/µL, and 10 µL of each sample was digested with the enzymes, PstI (CTGCAG) and MspI (CCGG), at 37 • C for 2 h and then at 65 • C for 20 min. The digested samples were ligated with the adapters from Illumina (San Diego, CA, USA). The ligated samples were then pooled using the same volume (10 µL) for PCR-amplification in a single tube. The fragment length was analyzed using a Bioanalyzer (Agilent), and the PCR products were quantified by a Qubit3.0 fluorometer (Invitrogen). The GBS library was run on an Illumina Hiseq2500 (San Diego, CA, USA).

RAD-seq Data and Population Structure Analysis
The sequencing reads of the 245 samples were extracted from the raw data of RAD-seq and filtered by using fastx_barcode_splitter and fastq_quality_filter with parameters (-q 20 -p 80 -Q 33) of fastx_toolkit-0.0.13.2 (http://hannonlab.cshl.edu/fastx_toolkit/). The high-quality sequencing data were aligned using BWA MEM (Li and Durbin, 2009). The software-samtools, mpileup, and bcftools , were then used to call the SNPs from the alignment files of the 245 samples; these were kept as the genotype of the sorghum population. These genotypic data were used to calculate the population structure using the fastSTRUCTURE software (Raj et al., 2014).

Genome-Wide Association Studies
The GWAS for the five forage quality-related traits (CP, CL, HC, NDF, and ADF) was performed using six methods, including mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO in the mrMLM software. The main model used in this study in the mrMLM software is as follows :y = Wα + Xβ + Zu + ε , where y is an n × 1 phenotypic vector of quantitative traits, and n is the number of accessions. W = (ω 1 , ω 2 , · · · , ω c ) is an n × c matrix of covariates (fixed effects), including a column vector of 1; the population structure or principal components can be incorporated into W . Moving on, α is a c × 1 vector of fixed effects, including the intercept, and X is an n × 1 vector of marker genotypes. β ∼ N(0, σ 2 β )is the random effect of putative QTN. Z is an n × m design matrix, and u ∼ MVN m (0, σ 2 g K) is an m × 1 vector of polygenic effects. K is a known n × n relatedness matrix. ε ∼ MVN(0, σ 2 e I n ) is an n × 1 vector of residual errors, and σ 2 e is residual variance. I n is an n × n identity matrix, and MVN denotes multivariate normal distribution. An LOD score of 3 was used as the critical threshold for significant QTNs for all the six methods.

Identification of Candidate Genes
Genes that were hit directly by the associated QTNs within a 50-kb stretch were selected to choose the candidate genes as described in Upadhyaya et al. (2016). The physical locations of the QTNs were recorded according to the assembly genome (Sorghum_bicolor_NCBIv3) and the annotation GFF file (https://www.ncbi.nlm.nih.gov/genome/108). The detailed functions of the corresponding genes were annotated by performing BLASTP search at the NCBI website, and the candidate genes were assigned to different biological processes based on the function of their homologs in other species in literature or with the help of data in the Conserved Domains Database. The selected candidate genes were associated with the main QTNs of the five traits if they made a contribution (r 2 ) greater than 5%.

Phenotype Analysis
Extensive phenotypic variations of CP, CL, HC, NDF, and ADF were observed in the 245 sorghum samples in the four environments, including two locations in 2 years (Fengyang and Tengqiao in 2015 and 2016, Table 1). The variation range of the five traits was 1.5 to 3.5-fold: the phenotype values of the CP content were 3.80 to 13.24% with 2.5 to 3.5-fold variation. The NDF content varied from 0.38 to 0.75 g/g with 1.5 to 1.9-fold variation, while the ADF content varied from 0.18 to 0.52 g/g with a 1.8 to 2.2-fold variation. Lastly, the HC and CL contents varied from 0.14 to 0.42 g/g and 0.12 to 0.45 g/g with 1.6 to 2.2-fold and 1.8 to 2.8-fold variations, respectively.
The correlation coefficients between a pair of traits were assessed. It was revealed that there were significant and positive correlations between ADF, NDF, CL, and HC. However, they correlated significantly but negatively with the CP phenotype, except for HC in 2015fy, 2015hn, and 2016fy and NDF in the 2016fy environments (Table S2). These results indicated that the four traits of ADF, NDF, HC, and CL could be genetically linked or that some genes could play pleiotropic roles in controlling these phenotypes.

RAD-Seq Genotyping And Population Structure
A total of 85,585 SNPs were identified in the genotypes of the 245 accessions using RAD-seq ( Table 2). Chromosome 1 had the most SNP markers (11,719), while chromosome 10 had the least (5,994). The highest SNP density was observed on chromosome 3 with 1.5 SNP markers per 10 kb, whereas the lowest density was on chromosome 7 with 0.9 SNP markers per 10 kb. The average density was 1.2 markers per 10 kb. Altogether, the genotyping results were of high quality in this research. The population structure was analyzed using the fastSTRUCTURE software. The results showed that the best value for the number of subpopulations was 5 (Figure 1), which was selected to perform further GWAS analysis.

GWAS Using Six Multi-Locus Methods
Six methods in the mrMLM software were used for the detection of QTNs. A total of 42 significant QTNs were detected for the five forage quality-related traits (CP, CL, HC, NDF, and ADF) across the four environments using six methods (Table 3). There were 5, 3, 3, 24, and 7 QTNs that were associated with CP, CL, HC, NDF, and ADF, respectively. Each trait was controlled by multiple QTNs. The 5 SNPs associated with the CP content were identified on chromosomes 2, 5, 7, and 9. The 3 SNPs associated with the CL content were present on chromosomes 2, 5, and 8, while the 3 SNPs associated with the HC content were located on chromosomes 1 and 9. The 24 SNPs associated with NDF were present on chromosomes 1, 2, 6, 7, 8, 9, and 10. Lastly, the 7 SNPs associated with the ADF content were present on chromosomes 3, 4, 5, 8, and 10. Among these QTNs, there were 4 significant QTNs, each of which was responsive for more than one trait. The three traits of ADF, CL, and NDF were associated with one QTN on chromosome 5 (RSS50197); both CL and NDF were associated with two QTNs (RSS21890 and RSS76122); ADF and CL were associated with one QTN (RSS68908) on chromosome 8. Among the above six methods, pLARmEB was the most powerful and accountable for the identification of the 24 QTNs that mainly contributed to the NDF content trait (17 QTNs); however, their contributions were less than what were detected by other methods, except for one major QTN (RSS17673), whose contribution was greater than 5% ( Table 3). The other methods of PKWmEB, ISIS EM-BLASSO, FASTmrMLM, mrMLM, and FASTmrEMA identified 12, 8, 8, 1, and 1 QTNs, respectively. About 43% (13 of 30) of these SNPs included the major QTNs (r 2 > 5%). Besides, 3 QTNs (RSS50197, RSS21890, and RSS1510) were detected simultaneously by 3 methods, and another 5 QTNs (RSS35476, RSS83457, RSS76122, RSS22092, and RSS17673) were identified simultaneously by 2 methods. The remaining QTNs were detected by a single method, but most of them were considered as reliable because of the high thresholds at which they were detected.

Identification of Candidate Genes
The assembled sorghum genome and the annotation file from NCBI were used to annotate the genes associated with the significant QTNs. There were 14 candidate genes for five forage quality-related traits. The NDF and CP content traits were associated with five and three candidate genes, respectively. The remaining 6 genes were related to the CL, HC, and ADF content traits with each trait being associated with two genes ( Table 4).
For the CP content trait, one candidate gene that was associated with the major QTN (RSS17673) encoded a serine/threonine-protein kinase (Sobic.002G217100), which was consistent with a previous study that concluded that serine/threonine-protein kinases are involved in signal cascade for nitrogen metabolism in plants (Champigny, 1995). Besides, two candidate genes were identified for the CP content on chromosomes 2 and 5 with one gene encoding a cysteine proteinase and the other encoding an uncharacterized protein. In addition, one main QTN associated with the CL content trait on chromosome 2 was identified, and the associated candidate gene encoded a kinesin-like protein. The kinesin protein is reported to be involved in the deposition of CL during secondary growth of fiber cells in Arabidopsis (Kong et al., 2015). Furthermore, 5 main QTNs were detected in association with the NDF content; two of these (RSS21890 and RSS50197) were co-localized with those for the CL content trait. Therefore, the same two candidate genes were identified for the NDF and CL content (Sobic.005G215300 and Sobic.002G390800). For the ADF content trait, 2 main QTNs were detected on chromosomes 3 and 10, where both candidate genes encoded a bHLH transcription factor (Sobic.003G272200 and Sobic.010G172100).

DISCUSSION
Genome-wide association study is an important alternative for mapping quantitative traits. It has been applied rapidly and extensively in plant research. These methods have been widely adopted, but only a few QTNs for each complex trait have been identified. In this study, we implemented the latest multi-locus GWAS methods available in mrMLM (Wang S. B. et al., 2016;Tamba et al., 2017;Wen et al., 2017;Zhang et al., 2017;Ren et al., 2018), which can effectively overcome the above issue and actively detect the QTNs associated with the quantitative traits. Six methods in the mrMLM software were used to identify the QTNs of five forage quality-related traits in sorghum. Of these methods, pLARmEB detected the most significant QTNs, but most of them contributed insignificantly to heritability ( Table 3).
Most of the significant QTNs associated with the NDF content, detected using pLARmEB, were observed to be in the 2015hn (13 QTNs) and 2015fy (4 QTNs) environments (  located in the tropics and subtropics, respectively, where the environment is particularly different in different climatic zones. The previous study has revealed that the climatic conditions, including temperature, water availability, and soil, are important factors which affect the forage quality of sorghum (Hussin et al., 2007). In our study, the QTN RSS50197 associated with the ADF, CL, and NDF traits was uniquely detected in the same environment of 2015fy by using three GWAS methods. The above results revealed the influence of environment in QTN detection. However, the latest methods of multi-locus GWAS applied in our study are currently unable to detect the QTN-byenvironment interaction. Thus, we hope that in the future new methods can be developed by the theoretical researchers. According to the GWAS analysis, 5, 3, 3, 7, and 24 QTNs were identified for CP, CL, HC, ADF, and NDF content, respectively. Of the 5 candidate loci for the CP content, 2 were already identified in the previous studies. The locus on chromosome 9 was mapped in the same region by Murray et al. (2008) and Li et al. (2015) in sorghum as well. Of the 3 candidate loci for the CL content, 2 were identified in the same region on chromosomes 2 and 8 by Murray et al. (2008) and Shiringani and Friedt (2011). Similarly, of the 7 loci for the ADF content, 2 were mapped on chromosome 4, which was in agreement with the report of Shiringani and Friedt (2011). As for the 24 loci for the NDF content, the 2 loci on chromosome 6 and 1 loci on chromosome 8 were also identified by Shiringani and Friedt (2011). More importantly, several QTNs that were detected by the six methods in this study were novel identifications for forage quality-related traits in sorghum.
The QTLs for the NDF or ADF content co-localized with those for the CL or HC content, which has been reported previously in sorghum. Cardinal et al. (2003) reported colocalization of QTLs that are associated with the cell wall components, such as lignin, NDF, and ADF in stalks of maize. Murray et al. (2008) and Shiringani and Friedt (2011) also found colocalization of QTLs associated with the CL, HC, NDF, and ADF content traits in sorghum by QTL mapping. In this study, we detected 4 colocalized QTNs: 1 for three traits and 3 for two traits. All of these QTNs were associated with NDF or ADF and with CL or HC. NDF is mainly composed of CL, HC, and lignin, while ADF is composed of CL and lignin. The difference between NDF and ADF is whether they have HC as a component or not. Furthermore, we found that NDF and ADF significantly correlated with CL or HC. It is reasonable that these QTNs were co-localized.
Both NDF and ADF include CL and lignin. There are a series of reports about the biosynthesis and signaling pathways of CL and lignin in plants (Kim et al., 2013;McNamara et al., 2015;Yoon et al., 2015;Chezem and Clay, 2016). In this study, we identified 5 and 2 candidate genes for the NDF and ADF content traits, respectively. Of these candidate genes, 1 gene (Sobic.001G378300) encoded a sucrose synthase, which is an integral component of the CL synthesis mechanism. Gerber et al. (2014) reported that deficient sucrose synthase activity in developing wood does not specifically affect the CL biosynthesis but causes an overall decrease in the cell wall polymers. Furthermore, Poovaiah et al. (2014) reported that the lignin content increases in all the transgenic switchgrass lines, where sucrose synthase (PvSUS1) was overexpressed.
Lignin, CL, and HC are the main components of secondary cell walls (Zhong et al., 2011). Secondary cell wall biosynthesis is positively regulated by NAD and MYB transcription factors (Zhong and Ye, 2014;Chezem and Clay, 2016). Moreover, studies have also identified several transcription factors (e.g., WRKY, ERF, and bHLH) that regulate the biosynthesis of secondary walls (Kim et al., 2013;Taylor-Teeples et al., 2015;Chezem and Clay, 2016). In this study, we identified a candidate gene encoding a bHLH transcription factor for CL and two bHLH genes for ADF. These transcription factors might also be involved in the regulation of CL or lignin biosynthesis. The function of the candidate genes identified in this work needs to be studied further by transformation experiments in the future.