Genome-Wide Association Mapping Identifies Novel Panicle Morphology Loci and Candidate Genes in Sorghum

Panicle morphology is an important trait in racial classification and can determine grain yield and other agronomic traits in sorghum. In this study, we performed association mapping of panicle length, panicle width, panicle compactness, and peduncle recurving in the sorghum mini core panel measured in multiple environments with 6,094,317 single nucleotide polymorphism (SNP) markers. We mapped one locus each on chromosomes 7 and 9 to recurving peduncles and eight loci for panicle length, panicle width, and panicle compactness. Because panicle length was positively correlated with panicle width, all loci for panicle length and width were colocalized. Among the eight loci, two each were on chromosomes 1, 2, and 6, and one each on chromosomes 8 and 10. The two loci on chromosome 2, i.e., Pm 2-1 and Pm 2-2, were detected in 7 and 5 out of 11 testing environments, respectively. Pm 2-2 colocalized with panicle compactness. Candidate genes were identified from both loci. The rice Erect Panicle2 (EP2) ortholog was among the candidate genes in Pm 2-2. EP2 regulates panicle erectness and panicle length in rice and encodes a novel plant-specific protein with unknown functions. The results of this study may facilitate the molecular identification of panicle morphology-related genes and the enhancement of yield and adaptation in sorghum.


INTRODUCTION
The sorghum inflorescence consists of a single panicle with many racemes and is an important determinant of grain yield (Hmon et al., 2013). Sorghum panicles are more extensively branched than maize and rice (Vollbrecht et al., 2005;Brown et al., 2006) and vary significantly in number, length, and angle of primary branches as well as the three-dimensional shape, size, and distribution of the seed (Li et al., 2020), especially compared to other major cultivated cereal crops (Brown et al., 2006). Therefore, sorghum is an excellent model for studying panicle morphology in panicle-bearing grasses. Sorghum panicles may be compact or open up to 50 cm long and 30 cm wide (Doggett, 1988), and their morphology depends on the number and length of panicle branches and the number of aborted spikelets (Brown et al., 2006). The panicle morphology is an important criterion for the racial classification of sorghum. The compact panicle is typical of domesticated sorghum, especially elite high-yielding modern commercial varieties (Kimber, 2000;Brown et al., 2006;Dillon et al., 2007;OGTR, 2017), whereas undomesticated species are more likely to have open panicles (Harlan and de Wet, 1972). Plants with open or loose panicles are more likely to be smallseeded, reducing grain yield (Desmae et al., 2016). However, compact panicles are also more prone to infection/infestation by grain mold (Sharma et al., 2010), webworm [Celama sorghiella (Riley)] (Hobbs et al., 1979), head bug (Calocoris angustatus Leth.), and head caterpillar (Helicoverpa armigera Hb.) (Sharma et al., 1994). As a result, race guinea with loose panicles is more common in wet environments to prevent grain molding, and race durra with compact panicles is more common in dry environments (Harlan and de Wet, 1972;Doggett, 1988;Ayana and Bekele, 1998).
Despite its importance in yield and adaptation, the genetic control of panicle morphology is not fully understood. Approximately 300 panicle morphology-related quantitative trait loci (QTLs) have been cataloged by Mace et al. (2019) from previous studies. More recently, Girma et al. (2019) identified 15 regions across the sorghum genome associated with panicle compactness and shape, and Faye et al. (2019) identified 13 panicle compactness loci that colocalize with a priori candidate genes. Olatoye et al. (2020) also found a significant enrichment of QTL colocalized with grass paniclerelated genes such as maize Ramosa2 and rice Aberrant Panicle Organization1 (APO1) and TAWAWA1, but many QTLs did not colocalize with panicle gene orthologs (Olatoye et al., 2020). They suggested that global panicle diversity in sorghum is largely controlled by oligogenic, epistatic, and pleiotropic variations in ancestral regulatory networks. Zhou et al. (2019) detected 35 unique SNPs associated with variation in panicle architecture using a semiautomated phenotyping pipeline called Toolkit for Inflorescence Measurement (TIM). They also found colocalization with previously mapped panicle-related loci and identified nine candidate genes.
The objective of this study was to identify QTL related to panicle morphology and recurving of peduncles and determine the candidate genes that regulate panicle morphology in sorghum using a genome-wide association study (GWAS) with phenotyping data on sorghum panicle length and width in 11 environments at International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), India, panicle compactness in two environments in China, and 6,094,317 single nucleotide polymorphism (SNP) markers in the sorghum mini core (MC) collection panel (Upadhyaya et al., 2009).

MATERIALS AND METHODS
A total of 242 accessions of sorghum MC (Upadhyaya et al., 2009) were phenotyped in rainy and post-rainy seasons with or without irrigation at ICRISAT, Patancheru, India. The plants were grown in an alpha design with three replicates. Each single-row plot was 4 m long with a row spacing of 75 cm and plant spacing within a row of 10 cm. Ammonium phosphate (150 kg/ha) was applied before planting, and 100 kg/ha of urea was applied as a top dressing 3 weeks after planting. For the post-rainy season with irrigation, field plots were irrigated five times at equal intervals, each with 7 cm of water. Panicle length and width were measured in centimeters according to the International Board for Plant Genetic Resources IBPGR/ICRISAT (1993).
The genome resequencing of 242 MC accessions and SNP development was performed as follows. The reference genome was the sorghum BTx623 (Paterson et al., 2009) version 3.1.1 (https://phytozome-next.jgi.doe.gov/info/Sbicolor_ v3_1_1), which was also used to identify candidate genes. Sequencing reads were mapped to the reference genome using BWA-MEM version 0.7.17 (Li, 2013) and sorted by SAMtools version 1.10 (Li et al., 2009). Duplicate reads were removed using Picard version 2.0.1 (http://broadinstitute.github. io/picard/). The SAMtools flagstat was used to calculate the mapping percentage. Sequence variation detection and SNP calling were performed using the GATK version 4.17 function HaplotypeCaller and SelectVariants (McKenna et al., 2010). SNPs were called with parameters "QD < 2.0, MQ < 40.0, FS > 60, SOR > 3.0, MQRankSum < −12.5, ReadPosRankSum < −8.0." SNPs were filtered with VCFtools version 1.16 (Li, 2013) using the parameters "max-missing 0.1, maf 0.05, maxDP 50, and minDP 10." Only SNPs on chr1-chr10 were used. This produced 6,094,317 SNPs for the GWAS analysis. Population structure was analyzed using Admixture version 1.3 (Alexander et al., 2015). The number of clusters (k) in MC was set to 2-15. Admixture version 1.3 was run for each k-value, using 489,339 SNPs (Supplementary Figure 1). The optimal k was determined to be 10, as the CV (i.e., cross-validation) error was the lowest at k = 10. This k-value was used to generate the Q matrix used in the GWAS, as described below.
The GWAS and linkage disequilibrium (LD) analysis were performed using the 6,094,317 SNPs after filtering based on the criteria of minor allele frequency of >0.05 and missing data rate of 10% or less in the population. The kinship matrix (K) was generated using EMMAX (Kang et al., 2010), and the GWAS was performed using EMMAX with Q matrix. The modified Bonferroni correction was used to determine the genome-wide significance thresholds of the GWAS, based on a nominal level of α = 0.05, corresponding to a raw P-value of 8.2 × 10 −9 or a -log10(P)-value of 8.08. Candidate genes were identified using FIGURE 1 | Panicle morphology of the five major races in the association mapping panel (Upadhyaya et al., 2009). IS 7250 has loose panicles, and IS 4631, IS 4092, and IS 12937 have compact panicles, whereas IS 608 has semi-compact panicles.

Phenotyping
Panicle length and width were found to be correlated with Pearson's correlation coefficients ranging from 0.56 to 0.70 (significant at P < 0.001). Figure 1 shows variations in panicle morphology of the five primary sorghum races in the association mapping panel (Upadhyaya et al., 2009) from a field evaluation in Hainan in 2020. Based on the panicle compactness data from the Hainan 2020 environment, 64% of the MC accessions had compact panicles, 14% had semi-compact panicles, and 22% had loose panicles. In the 11 ICRISAT testing environments (Supplementary Table 1), panicle width was more variable across the environments than panicle length as measured by the coefficient of variation (CV). The CV for panicle length ranged from 0.27 to 0.39, with a mean of 0.33, while that for panicle width ranged from 0.21 to 0.61, with a mean of 0.48 (Table 1; refer to Supplementary Table 2 for variance). In contrast, irrigation in Environments 3 and 5 did affect panicle length and width compared to no irrigation in Environments 4 and 6 but not consistently. By comparing Environments 3 to 5, irrigation did not significantly affect the panicle length (P = 0.17) but decreased the panicle width by 1.42 cm on average (P = 0.0034). Between Environments 4 and 6, irrigation increased the panicle length by 1.9 cm on average (P = 0.0030) but decreased the panicle width by 1.85 cm on average (P = 0.000012). When panicle compactness was scored only as compact, semi-compact, and loose, panicle length and width were negatively correlated with panicle compactness with r = −0.40 and −0.27, respectively, in Environment 1 at ICRISAT, and both were significant at P < 0.001 (i.e., panicle compactness was only measured in Environment 1 at ICRISAT). Similarly, in the 2020 Hainan dataset, panicle length and width were negatively correlated with panicle compactness with r = −0.42, and −0.47, respectively, and both were also significant at P < 0.001. These results indicate that loose panicles tend to be longer and wider, and compact panicles are shorter and narrower. Using 100 seed weight data obtained from the studies by Upadhyaya et al. (unpublished) and Li et al. (unpublished), we found that seed weight was positively correlated with panicle compactness both at ICRISAT (r = 0.33; significant at P < 0.001) and Hainan (r = 0.31; significant at P < 0.001), indicating that loose panicles often carry smaller seeds and that compact panicles carry larger seeds. This may have contributed to the positive correlation between panicle compactness and seed weight per panicle (r = 0.23; significant at P < 0.01). Since the untransformed data were used in this study, heritability may not be as accurately estimated (Fusi et al., 2014), and small-effect QTLs may not be identified by GWAS (Goh and Yap, 2009

Genome-Wide Association Study
For a trait to be mapped, the association had to be strong in multiple environments with multiple SNPs and reached the Bonferroni correction P-value of 8.2 × 10 −9 or a -log(P) of 8.08, in at least two environments, except for recurving peduncles, which was evaluated only in one environment. Using these criteria, we identified 11 QTLs: one on chromosome 4 for panicle length/width ratio, two for peduncle recurving with one each on chromosomes 7 and 9, eight for panicle length and width, and one compactness colocalized with panicle length and width on chromosome 2 ( Table 2; representing SNPs from each locus are presented in Supplementary Table 3). For the eight-panicle length and width QTLs, two were on chromosomes 1, 2, and 6, and one each was located on chromosomes 8 and 10 (Figure 2,      Figures 2-9). Associations with Pvalues lower than the Bonferroni threshold were not observed in environments with a CV lower than the average, 0.33 and 0.48 for panicle length and width, respectively, except for panicle width in Environment 8 (Figure 2, Table 1, Supplementary Figures 2-9). Pm 2-1 and Pm 2-2 were both detected in the greatest number of environments with low P-values (Figure 2); Pr 7-1 and Pr 9-1 were associated with peduncle recurving with the lowest P-values (Supplementary Figure 9). We focused on these loci to identify candidate genes.
Frontiers in Plant Science | www.frontiersin.org

DISCUSSION
Our goal was to map major QTLs that are stable across environments and identify genes that can be used to improve economically important traits in sorghum and other species.
In this study, we mapped nine panicle morphology QTLs, such as Pm 2-1 and Pm 2-2, and two peduncle recurving QTLs, such as Pr 7-1 and Pr 9-1. Neither Pm 2-1, Pm 2-2, Pr 7-1, and Pr 9-1 were previously identified by other groups (Faye et al., 2019;Girma et al., 2019;Zhou et al., 2019;Olatoye et al., 2020), nor they were identified in 22 studies cataloged by Mace et al. (2019). The Pr 7-1, Pm 2-1, and Pm 2-2 loci contained four, four, and six genes, respectively. The RNAseq data available at Phytozome (McCormick et al., 2018) may provide insight into their functions. In addition, LD can be used to identify candidate genes mapped by GWAS (Sulem et al., 2008). For the three genes in Pr 7-1, the highest expression of Sobic.007G072600 and Sobic.007G072901 was in both the peduncle and the panicle at the floral initiation stage, while the highest expression of Sobic.007G072800 was in the leaf sheath. Sobic.007G072700 was not expressed in the peduncles. Both Sobic.007G072600 and Sobic.007G072901 are good candidates in determining which gene in this locus causes recurving peduncles. Among the four genes in Pm 2-1, Sobic.002G355700 and Sobic.002G356000 were not expressed in peduncles and Sobic.002G355900 was almost exclusively expressed in dry seeds. The remaining Sobic.002G355800 was highly expressed in leaf sheaths, panicles, shoots, and stems, with slightly lower expression in peduncles, and resides inside an LD block (Figure 3). Therefore, Sobic.002G355800 is a candidate gene for the Pm 2-1 locus.
In the Pm 2-2 locus, Sobic.002G374100 is co-expressed with genes in an anthesis stage-specific co-expression subnetwork with very low expression in peduncles; Sobic.002G374500 is not expressed in panicles or peduncles, and the highest expression of Sobic.002G374600 is in leaves and shoots. The remaining three genes (Sobic.002G374200, Sobic.002G374300, and Sobic.002G374400) were highly expressed in the panicles and peduncles. However, Sobic.002G374400 shares 66% identity and 77% similarity with Erect Panicle2 (EP2) in indica rice and is the only gene inside an LD block (Figure 3). EP2 regulates panicle erectness, panicle length, and grain size in rice (Zhu et al., 2010). The EP2 mutants have shorter panicle length, more vascular bundles, and a thicker stem than that of wild-type plants, creating an erect panicle phenotype. EP2 encodes a novel plant-specific protein localized to the endoplasmic reticulum with unknown function (Zhu et al., 2010) and is a candidate for the Pm 2-2 locus. This is possible because panicle morphology regulation in both sorghum and rice may have similar mechanisms (Chen et al., 2015). Previous studies have identified genes related to panicle/tassel morphology in the grasses. In maize, mutations in Ramosa produce a maize tassel resembling a loose sorghum panicle (Vollbrecht et al., 2005). Ramosa1 transcription factor regulates long inflorescence branch architecture similarly in maize and sorghum but is absent in rice and heterochronically expressed in sorghum (Vollbrecht et al., 2005). Several panicle morphology-related genes have been identified in rice. A rice ncl-1, HT2A, and lin-41 (NHL)-domain-containing protein encoded by FUWA produces a more compact and erect panicle when the gene is mutated, and the mutant can be rescued by orthologs from sorghum and maize, indicating that the regulation of panicle morphology by this gene is evolutionarily conserved in rice, sorghum, and maize (Chen et al., 2015). The OsLG1 gene product also regulates rice panicle compactness; its overexpression converts compact panicles to loose panicles. OsLG1 is an squamosa promoter-binding (SBP)domain transcription factor that controls the development of rice ligules. The association analysis found that an SNP in the OsLG1 regulatory region led to a compact panicle architecture in cultivated rice during rice domestication (Zhu et al., 2013). Another rice panicle morphology gene, APO1, encodes an F-box protein. The overexpression of APO1 increases panicle branches and spikelets (Ikeda et al., 2007), whereas APO1 mutation reduces the number of secondary branches by >90% and the total number of flowers by >70% (Ikeda et al., 2005). The abovementioned studies of Ramosa in maize and FUWA in rice, as well as the fact that the bulk of maize tassel and sorghum panicle developmental activities are shared (Leiboff and Hake, 2019), demonstrate similarities and differences in inflorescence development in maize, rice, and sorghum. Further studies are required to confirm whether the candidate genes identified in this study play a role in panicle morphology in sorghum and their possible effects on yield and related traits.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
HU, SS, CG, and RK phenotyped panicle length and width in the 11 environments at ICRISAT. LW, JZ, YL, and YZ performed phenotyping in Hainan 2017 and 2020. Y-HW scored panicle compactness in Hainan 2017 and 2020, as well as peduncle recurving in Hainan 2020 and wrote the manuscript. JL performed GWAS/LD analysis and normality test and calculated variance and broad-sense heritability. JL and Y-HW analyzed the GWAS results. All authors have read and approved the manuscript for publication.