Genome-wide association study of plant color in Sorghum bicolor

Introduction Sorghum plant color is the leaf sheath/leaf color and is associated with seed color, tannin and phenol content, head blight disease incidence, and phytoalexin production. Results In this study, we evaluated plant color of the sorghum mini core collection by scoring leaf sheath/leaf color at maturity as tan, red, or purple across three testing environments and performed genome-wide association mapping (GWAS) with 6,094,317 SNPs markers. Results and Discussion Eight loci, one each on chromosomes 1, 2, 4, and 6 and two on chromosomes 5 and 9, were mapped. All loci contained one to three candidate genes. In qPC5-1, Sobic.005G165632 and Sobic.005G165700 were located in the same linkage disequilibrium (LD) block. In qPC6, Sobic.006G149650 and Sobic.006G149700 were located in the different LD block. The single peak in qPC6 covered one gene, Sobic.006G149700, which was a senescence regulator. We found a loose correlation between the degree of linkage and tissue/organ expression of the underlying genes possibly related to the plant color phenotype. Allele analysis indicated that none of the linked SNPs can differentiate between red and purple accessions whereas all linked SNPs can differentiate tan from red/purple accessions. The candidate genes and SNP markers may facilitate the elucidation of plant color development as well as molecular plant breeding.


Introduction
Plant color in sorghum [Sorghum bicolor (L.) Moench] is defined as the stem/leaf sheath/leaf color (Rana et al., 1976;Reddy et al., 2008;Rooney, 2016;Fedenia et al., 2020) at maturity (Valencia and Rooney, 2009).Plant color is controlled by the P and Q genes.A sorghum plant with P_Q_ genotype is purple, whereas P_ qq is red and pp Q_ and pp qq are tan (Dykes et al., 2009;Valencia and Rooney, 2009;Dykes et al., 2011).
Plant color is associated with other phenotypes or consumer preferences.For example, white sorghum grain from tan plants is more desirable for human or animal consumption (Williams-Alanis et al., 1999;Funnell and Pedersen, 2006;Rooney, 2016).This is probably because tan plants tend to have lower tannin content compared with purple plants (Gourley and Lusk, 1978;Dykes et al., 2005).However, sorghum grains grown on plants with purple/red plant color do have higher levels of total phenols than those from tan plants (Dykes et al., 2005), although grains from some tan plants have the highest flavone (luteolin and apigenin) content (Dykes et al., 2009, Dykes et al., 2011).Tan plants tend to have lower head blight incidence caused by Fusarium moniliforme than red plants (Torres-Montalvo et al., 1992), but it is not clear if this is related to the high luteolin and apigenin contents.Du et al. (2010) have shown that flavones such as luteolin function as a phytoalexin against the sorghum anthracnose pathogen Colletotrichum sublineolum.
Sorghums with red/purple plant color produce the highest levels of 3-deoxyanthocyanidins (apigeninidin and luteolinidin) (Dykes et al., 2011), which are also phytoalexins induced by fungal attack (Snyder and Nicholson, 1990).The purple phenotype after fungal attack is determined by the production of two 3deoxyanthocyanidins, apigeninidin and luteolinidin, which are not produced by the tan plants (Kawahigashi et al., 2016).The underlying P gene has been cloned using map-based cloning in progeny from a cross between purple Nakei-MS3B (PP) and tan Greenleaf (pp) cultivars; the gene was located in a 27-kb genomic region between markers CA29530 and SB25792 on chromosome 6.Four candidate genes identified in this region were similar to the maize leucoanthocyanidin reductase gene induced by wounding, and only the Sb06g029550 gene was induced in both cultivars after wounding.The Sb06g029550 protein was detected in Nakei-MS3B but only slightly in Greenleaf.A recombinant Sb06g029550 protein had a specific flavanone 4-reductase activity and converted flavanones (naringenin or eriodictyol) to flavan-4-ols (apiforol or luteoforol) in vitro (Kawahigashi et al., 2016).
In this study, we evaluated plant color of the sorghum mini core collection (MC; Upadhyaya et al., 2009) as the association panel.This panel has been extensively characterized, such as its genetic structure and linkage disequilibrium (Wang et al., 2013) and effectiveness for association mapping (Upadhyaya et al., 2013).Most importantly, the panel has been used to clone a pleiotropic SbSNF4-2 (SnRK1bg2) that increases both biomass and sugar yield in sorghum and sugarcane (Upadhyaya et al., 2022).We scored leaf sheath/leaf color at maturity as tan, red, or purple across three testing environments in Tengqiao/Hainan and Fengyang/Anhui, China, performed association mapping with 6,094,317 SNP markers (Wang et al., 2021), and identified candidate genes strongly linked to plant color.

Plant materials and phenotyping
The accessions of the sorghum MC (Upadhyaya et al., 2009, Table S1) were grown in Tengqiao, Hainan, China, for two seasons (2021 and 2022) and in Fengyang, Anhui, China, for one season (2022).In both 2021 and 2022 in Tengqiao, Hainan, the plants were grown with a row spacing of 65 cm and a plant spacing within each row of 25 cm.A compound fertilizer (N:P:K = 15:15:15) and urea were applied before planting at 200 kg/ha and 120 kg/ha, respectively.The plot was irrigated once at seedling and once at stem elongation stages and weeded at before three-leaf, during fourto-six-leaf, and before anthesis stages.Pesticides were applied three times to control cutworms, aphids, and honeydew moths.
In Fengyang, Anhui in 2022, the plants were grown with a row spacing of 50 cm and plant spacing within each row of 25 cm.A compound fertilizer (N:P:K = 15:15:15) and urea were applied before planting at 180 kg/ha and 90 kg/ha, respectively.The plot was irrigated once at seedling and once at stem elongation stages and weeded at before three-leaf, during four-to-six-leaf, and before anthesis stages.Pesticides were applied three times to control cutworms, aphids, and honeydew moths.The sorghum plant color phenotype from the mini core collection: tan, red, and purple according to Rooney (2016).Wang et al. 10.3389/fpls.2024.1320844Frontiers in Plant Science frontiersin.org At maturity in all three environments, plant color was scored for leaf/leaf sheath color as "1" (tan), "2" (red), or "3" (purple) (Figure 1) according to Rooney (2016).

Genome-wide association study
Genome resequencing of 237 MC accessions (Supplementary Table S1) and genome-wide association study (GWAS) were as described in Wang et al. (2021).GWAS was performed with 6,094,317 SNPs from Wang et al. (2021).The kinship matrix (K) was generated by EMMAX (Kang et al., 2010), which was used to perform GWAS analyses with the Q matrix calculated using STRUCTURE 2.3.4 (Pritchard et al., 2000) as the covariate variable.The modified Bonferroni correction was used to determine the genome-wide significance thresholds of the GWAS, based on a nominal level of a = 0.05 which corresponds to a P value of 8.2E-09, or −log 10 (P) values of 8.08.At a = 0.01, these were 1.6E-09 and 8.78, respectively.

Candidate gene identification and allelic effect of linked SNPs
Candidate genes were identified using the reference genome Sorghum bicolor v3.1.1 (Paterson et al., 2009;McCormick et al., 2018) curated at Phytozome (Goodstein et al., 2012) 13 (https:// phytozome-next.jgi.doe.gov/).RNA-seq data (McCormick et al., 2018) for each candidate genes were downloaded from the site and provided as Supplementary Table S2.To determine the allelic effect of selected SNPs linked to plant color, SNPs in each locus or two loci were grouped together.Only accessions with less than 5% missing data rate for each group of SNPs were included.The original data are provided in Supplementary Tables S3-S8.

Phenotype analysis
As described in the Introduction, plant color is controlled by multiple genes.This is reflected in phenotyping in this study.All accessions were consistently scored as either tan (9 accessions) or pigmented (228 accessions) in all three environments (2021_HN, 2022_HN, 2022_FY; Supplementary Table S1).However, 47 of the 228 accessions (20.6%) could not be consistently scored as either red or purple across the three environments.This indicates that the trait may be affected by the environment as well as the combinations of multiple genes.

Candidate gene identification
Only genes closest to the respectively linked SNPs are presented in Table 1.All loci contained one to three candidate genes (Table 1).The qPC5-1 and qPC6 were further examined with linkage disequilibrium (LD) analysis combined with the Manhattan plot (Figure 3).In qPC5-1, Sobic.005G165632 and Sobic.005G165700 were located in the same LD block with the QTL peak.In qPC6, Sobic.006G149650 and Sobic.006G149700 were located in the different LD blocks.The qPC6 peak contained only one gene, Sobic.006G149700, which indicates that it should be the candidate gene for qPC6.The annotation information showed that Sobic.006G149700 is senescence regulator/heavy metal-associated isoprenylated plant protein 34.

Allelic effect on plant color
We examined the allelic effect of all SNPs from the eight loci.For each locus, only accessions with missing data rate less than 5% were selected.In all loci, more purple accessions were observed than tan and red combined and no SNPs from the loci could differentiate between purple and red color accessions whereas most SNPs from all loci can differentiate tan from red/purple accessions (Supplementary Tables S3-S8).We presented three of four SNPs (5:64621753, 5:64224755, and 6:51113980) most tightly linked to plant color from Table 1 in Figure 4. Six tan accessions were identified for all three SNPs whereas 7, 12, and 5 red accessions were identified, respectively.In contrast, 37, 71, and 55 purple accessions were identified respectively for the three markers.In both 5:64224755 (T/C) and 6:51113980 (G/C), IS20740 was the single heterozygote and the T and G alleles respectively were dominant to the C alleles as CC homozygotes in both SNPs were red or purple, whereas the heterozygotes were tan.In the other five accessions, TT and GG genotypes in the two SNPs showed tan plant color.It is coincidental that in all three SNPs, red/purple accessions were all CC genotypes.

Discussion
White sorghum grain grown on tan plants is highly desirable as livestock feed and for human consumption (Awika et al., 2002).The tan/purple/red plant color is mainly controlled by the P and Q genes (Dykes et al., 2009;Valencia and Rooney, 2009;Dykes et al., 2011).In this study, we identified eight loci for plant color across three environments.Among these, qPC6 locus at 51,113,980 bp on chromosome 6 is long way off the plant color QTLs mapped by Boyles et al. (2017).They mapped one locus each at 56650607 and 56635333 bp on chromosome 6 in BTx642/BTxARG-1 and BTxARG-1/P850029 RIL populations, respectively.However, their two QTLs range from 49.9 Mb to 60.77 Mb and from 50.91 Mb to 60.6 Mb, both overlapping with qPC6.The peaks at 56,650,607 and 56,635,333 bp are close to the P gene (57164448.57187434in Sorghum bicolor v3.1.1),which turns the leaves to purple upon wounding or pathogen invasion (Kawahigashi et al., 2016).This is because Boyles et al. (2017) used Sorghum bicolor v3.1 and Kawahigashi et al. (2016) used Sorghum bicolor v1.4 at www.plantgdb.org/SbGDB,which is no longer functional at the time of this writing.Therefore, genomic locations are not comparable although Sb06g029550 (Sobic.006G226800)from Kawahigashi et al. (2016) is located in Sorghum bicolor v3.1 as from 57,175,961 bp to 57,178,219 bp on chromosome 6.In qPC6, those highly associated SNPs were only located in the Sobic.006G149700gene region (Figure 3), which is annotated as a senescence regulator.Its highest expression was in the leaf sheath at floral initiation, followed by seeds at maturity and juvenile leaf blades (Supplementary Table S2; McCormick et al., 2018).It is clear that qPC6 does not overlap with the P gene.This could suggest that there are multiple genes responsible for plant color in sorghum.Sobic.006G149700 is orthologous to Arabidopsis AtS40 (AT2G28400) and its mutation delayed leaf senescence (Fischer-Kilbienski et al., 2010).As mentioned above, RNA-seq expression data (Supplementary Table S2) by McCormick et al. (2018) may help identify candidate genes.In this study, plant color was scored for leaf/leaf sheath color as "1" (tan), "2" (red), or "3" (purple) according to Rooney (2016).Candidate genes physically close to the linked SNPs are either highly expressed in leaves, leaf sheath, or both (Table 1; Supplementary Table S2).For example, in qPC1 Sobic.001G324900 is the only gene within 3 kb of the locus and the gene's highest expression is in the leaves and moderate expression in the leaf sheath; Sobic.002G416400 in qPC2 is the only gene between the linked SNPs and is highly expressed in the leaf sheath; in qPC5-1, three genes are within the locus but only Sobic.005G165800 is highly expressed in both leaves and leaf sheath; as the only gene within the qPC6 locus, Sobic.006G149700'shighest expression is in the leaf sheath and leaves; and Sobic.009G031700 is the only gene in qPC9-1 with all linked SNPs in its coding region and is highly expressed in the leaves.The only exception is Sobic.005G167600 in qPC5-2, which is the only gene within the linked SNPs, and it is not highly expressed.In contrast, Sobic.004G200700 is 9 kb from qPC4 and is only highly expressed in the panicles and Sobic.009G101700 in qPC9-2 is 16 kb away and ubiquitously expressed.These indicate a loose correlation between the degree of linkage and tissue/organ expression of the underlying genes.It is possible that altered expression of these genes could impact plant color scored using leaves and leaf sheath.
Sorghums with red/purple plant color are also induced by fungal attack (Snyder and Nicholson, 1990).In the current study, we also identified one candidate gene associated with fungal resistance.In qPC5-2, Sobic.005G165700 is the antimicrobial

A B
Manhattan plot (top) of the qPC5-1 (A) and qPC-6 (B) loci on chromosomes 5 and 6 and their respective LD blocks (bottom).The X-axis represents position in bp along the chromosome, and the Y-axis is −log 10 (P).
peptide MBP-1 family protein precursor, which has been reported as effective against both Gram-negative and Gram-positive bacteria as well as several filamentous fungi (Duvick et al., 1992).As stated above, Sobic.005G165800 is highly expressed in both leaves and leaf sheath, although its highest expression is in seed grain at maturity and the panicles (Supplementary Table S2).There is no ortholog of this gene in Arabidopsis, and no orthologs in maize or rice have been studied.Therefore, the correlation of plant color and antimicrobial peptide needs to be further investigated.
In conclusion, in this study, we mapped eight loci associated with sorghum plant color, one each on chromosomes 1, 2, 4, and 6 and two on chromosomes 5 and 9.We identified several candidate genes that are highly expressed in the leaves/leaf sheath, and one of the candidate genes was Sobic.006G149700encoding a senescence regulator.This may facilitate the elucidation of plant color development as well as molecular plant breeding.

TABLE 1
The plant color QTLs in sorghum detected in all three environments*.