Genetic Diversity Relationship Between Grain Quality and Appearance in Rice

Grain quality is an important breeding objective in rice, and the appearance of the grain also affects its commercial value in the market. The aim of this study was to decode the rice grain qualities and appearances, such as gelatinization temperature (GT), amylose content (AC), grain protein content (GPC), pericarp color (PC), length/width ratio (LWR), and grain volume (GV) at phenotypic and genetic levels, as well as the relationships among them. A genome-wide association study (GWAS) was used to identify the quantitative trait locus (QTLs) associated with the target traits using mixed linear model (MLM) and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK) methods. In general, AC was negatively correlated with GPC and GV, while it was positively correlated with LWR and PC. GPC was positively correlated with LWR. Using the rice diversity panel 1 (RDP1) population, we identified 11, 6, 2, 7, 11, and 6 QTLs associated with GT, AC, GPC, PC, LWR, and GV, respectively. Five germplasm lines, superior in grain qualities and appearances for basic breeding materials or improvement, were identified. Notably, an F-box gene OsFbox394 was located in the linkage disequilibrium (LD) block of qLWR7-2, which specifically expresses in endosperm and seed tissues, suggesting that this gene may regulate the seed development in rice grain. Besides, different haplotypes of OsHyPRP45 showed significant differences in AC, indicating that this gene may be related to AC in rice grain.


INTRODUCTION
Rice (Oryza sativa L.) is one of the most widely cultivated cereal crops all over the world and provides the staple food for over half of the world population (Mbanjo et al., 2020). With modern technology advancement and improvement of the quality of life, people are seeking food with high nutritional and appearance qualities. Eating and cooking qualities (ECQs) and grain protein content (GPC) are the main factors that determine rice grain quality. ECQ could be further dissected into amylose content (AC), gel consistency (GC), and gelatinization temperature (GT). The AC could be divided into five groups, namely, waxy (0-2%), very low (3-9%), low (10-19%), intermediate (20-25%), and high (>25%). Rice grains with an AC of 16-20% are the most popular type in markets and meet the demand of ECQ from customers (Song et al., 2019). GT is usually measured by alkali spreading value (ASV), which is evaluated by the extent of dispersal of whole milled rice grains in a dilute alkali solution (Pang et al., 2016). GPC is the most vital nutritional compound in rice grain around 8%, which is lower than other cereal grains (Chen et al., 2018). Glutelin, albumin, globulin, and prolamin are the four components of rice seed storage protein. Rice grain has a higher percentage of glutelin, which is more easily digested by humans, making it a superior resource of high-quality protein. Pericarp color (PC), length/width ratio (LWR), and grain volume (GV) are important features that influence the appearance quality of rice. The red and purple pigments in rice grains are mainly proanthocyanidin and anthocyanidin, respectively. Both of the compounds have antioxidant, antidiabetic, antihyperlipidemic, and anticancer activities, which benefit human health (Mbanjo et al., 2020). The volume and shape of grain are key factors that determine the rice yield and market value. In general, the slender shape of rice is usually preferred by customers in the United States, Southern China, and South and Southeast Asian countries, while the short and round rice grains cater to the consumers in Northern China, Korea, and Japan (Huang et al., 2013).
In a previous study, 44k single nucleotide polymorphism (SNP) variants were used to identify 34 QTLs in rice diversity panel 1 (RDP1) accessions (Zhao et al., 2011). The release of the High-Density Rice Array (HDRA), containing 700k SNP (McCouch et al., 2016), allows us to map more precisely for certain traits in the RDP1 population. In this study, the distribution and relationships among AC, GT, GPC, PC, LWR, and GV were analyzed, and two genome-wide association study (GWAS) methods (i.e., MLM and BLINK) were used to identify the QTLs associated with the target traits. Superior lines for high GPC and satisfactory ECQ were selected for breeding improvement. In addition, candidate genes controlling AC and LWR in rice were detected, and further experiments were required for validating the function of candidate genes.

Plant Materials
The RDP1 consists of 421 purified homozygous varieties (Eizenga et al., 2014) such as indica (IND), aus (AUS), tropical japonica (TRJ), temperate japonica (TEJ), and aromatic subgroups (ARO). Among them, 406 have been genotyped by the HDRA, which is used in this study. The detailed information of the varieties is listed in Supplementary Table 1.

Determination of the Grain Quality and Appearance Quality
Six traits that were widely used to characterize rice grain quality and apparent traits were studied in this study, such as grain amylose content (AC), alkali spreading value (ASV), grain protein content (GPC), grain length/width ratio (LWR), grain volume (GV), and pericarp color (PC). All the traits were obtained from the USDA website (https://www.ars.usda. gov/southeast-area/stuttgart-ar/dale-bumpers-national-riceresearch-center/docs/rice-diversity-panel-1-rdp1/).

SNP Data Set and Population Structure
The HDRA (700k SNPs) file was downloaded from the Rice Diversity website (http://www.ricediversity.org/data/). The detailed information of samples used in this study was listed in Supplementary Table 1. The same procedure was used to preprocess the genotype data set with a previous study (Zhong et al., 2021).

Genome-Wide Association Study
The GWAS was performed among 406 rice varieties derived from RDP1 with 411,066 high-quality SNPs. Two GWAS methods (i.e., MLM and BLINK) were employed to evaluate the trait-SNP associations for grain quality and appearance traits using the Genomic Association and Prediction Integrated Tool (GAPIT) (Lipka et al., 2012). The first four principal components (PCs) were used as covariates to correct population structure due to population stratification in RDP1. To control the Type I error (i.e., false-positive), we set p-value = 2.20E-07 as a threshold, which was determined by 0.05/n, where n is the effective number of independent markers. The effective number of independent markers (n = 227,753) was calculated using GEC software version 0.2 . The Manhattan and Q-Q plots for GWAS were visualized using the R package qqman (D. Turner, 2018).

Mining Candidate Genes and Annotation of SNPs
The QTLs identified that the MLM and BLINK models provide important information for understanding the genetic architecture of grain quality and appearance in rice. To explore candidate genes responsible for each QTL, we defined local LD with the CI method (Gabriel et al., 2002), and all genes in the blocks were extracted for further analysis. For the QTLs that failed to define LD blocks, we extracted genes in the 100 kb upstream and downstream of leading SNPs. The gene annotation file was downloaded from The Rice Annotation Project Database website (https://rapdb.dna.affrc.go.jp/index. html). Then, the SnpEff software (Cingolani et al., 2012) was used to annotate the effect of the variant for candidate genes.

Grain Quality and Apparent Variant Among the Rice Subpopulations
Based on the principal component analysis (PCA), we divided the varieties into six subgroups [i.e., IND, AUS, TEJ, TRJ, ARO, and ADM (Admixture)]. We analyzed the distribution of the six traits in subgroups, and the results showed that all of the traits have significant differences (p < 2.2E-16) among the subgroups except protein content, indicating that this trait was less variable compared with other traits (Figure 1). TEJ and IND had higher GT, while TRJ and AUS subgroups exhibited lower GT ( Figure 1A). AUS and IND showed higher AC, and especially AUS displayed the highest AC. The AC was different in the japonica subspecies (i.e., TRJ and TEJ), and TRJ had relatively higher AC, while TEJ contained the lowest AC ( Figure 1B).
GPC was one of the most valuable nutrition in rice grain; the ARO subgroups showed the highest mean value of GPC, while AUS and IND subgroups showed relatively lower ingredients ( Figure 1C). PC was also another important character in rice. Japonica species (i.e., TEJ and TRJ) had a bigger chance to produce white or light brown color rice, while the AUS subgroup usually exhibited darker color (i.e., red or brown) ( Figure 1D). By analyzing the seed volume and seed LWR, we found that the ARO subgroup showed a relatively smaller and slender grain, while the TEJ subgroup displayed a bigger and round appearance (Figures 1E,F).
From the abovementioned details, the ARO subgroup contained the highest grain GPC with darker color and slender shape, which could provide both protein and antioxidant compounds. But the seed size was the lowest, and the AC was relatively higher, which was needed to be improved. GV in TEJ was significantly bigger than other subgroups, suggesting some genes may play a key role in the seed development in TEJ. Thus, by mapping the gene associated with AC and GV, we could decrease the AC and increase the GV for the targeting elite varieties from ARO. Whiskers represent 1.5 times the interquartile of the data. The upper and lower edge of the box presents the interquartile range of the data. The thick black bar in the center represents the median of the data. The one-way ANOVA test is used to determine whether there are any statistically significant differences between the subgroups, and the p-value is shown on the left corner of each figure. The total population acts as a reference group, and each subgroup is compared with the reference group. Different symbols above each subgroup indicate significant differences: *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001. The dashed black line represents the mean value of the whole population.  40,7,7,7,85,16,46,7,32,68,51,12, and 22, respectively. Whiskers represent 1.5 times the interquartile of the data. The upper and lower edge of the box presents the interquartile range of the data. The thick black bar in the center represents the median of the data. The one-way ANOVA test is used to determine whether there are any statistically significant differences between the subgroups, and the p-value is shown on the left corner of each figure. The total population acts as a reference group, and each subgroup is compared with the reference group. Different symbols above each subgroup indicate significant differences: *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001. The dashed black line represents the mean value of the whole population.

Grain Quality and Apparent Variant Among the Rice Groups From Different Geographic Regions
The 406 varieties studied in this study are from 13 regions all over the world. The detailed information was listed in Supplementary Table 1. By analyzing the distribution of the six traits in subgroups, the results showed significant differences in all populations from p-value = 4.7E-05 (GPC) to p-value < 2.2E-16 (GV) (Figure 2). The materials from East Asia, East Europe, and West Europe showed higher GT with lower AC, and most of them were round in shape. While the GV among the three areas was significantly different, the seed size from Europe (i.e., East Europe and West Europe) exhibits an obvious bigger compared with that from East Asia. Besides, the seeds from Central Asia present the highest mean value of GPC with proper AC (with the median at 15.60%), but the grain size of Central Asia was not the largest, which was need to be improved for increasing yield. The varieties from South Asia contained lower GPC, smallest GV, and highest AC, which seemed to be not the ideal basic materials for breeding. But the color of the materials from South Asia was relatively darker compared with other regions, meaning more pigment accumulation in these seeds. Thus, these seeds were good materials for us to study the genes regulating grain color and unitized to enhance the antioxidant activities in elite cultivars.

Variation of Grain Quality in Relationship With Apparent Traits
We compared the GPC among the differences of AC and LWR. In general, the GPC varies along with the AC and the LWR. The seed that contained the highest GPC usually exhibited very low AC ( Figure 3A) and slender seed shape ( Figure 3B). The grain with higher AC usually had the potential for the storage of lower GPC. The intermediate type of seed shape (i.e., LWR with 2-3) contained the lowest GPC. The relationship of AC with LWR and PC was also performed. The results showed that the seeds with a slender shape ( Figure 3C) and a darker color ( Figure 3D) usually present higher AC. The darker the PC, the higher the AC in rice grain. The round-shaped and white color rice had the proper AC, which was the most favorite type in customers. Finally, we also compared the GV differences between AC and PC. The seeds with low AC had the performance of the largest size show the distribution of the traits in subgroups. Whiskers represent 1.5 times the interquartile of the data. The upper and lower edge of the box presents the interquartile range of the data. The thick black bar in the center represents the median of the data. The one-way ANOVA test is used to determine whether there are any statistically significant differences between the subgroups, and the p-value is shown on the left corner of each figure. The total population acts as a reference group, and each subgroup is compared with the reference group. Different symbols above each subgroup indicate significant differences: *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001. The dashed black line represents the mean value of the whole population.
of rice grain (Figure 3E), and the smaller size of rice grain usually exhibited higher AC (Figure 3E). PC also represents differences with seed volume on the overall population level ( Figure 3F). The white materials exhibited a bigger size compared with colored rice, and the brown color rice was the smallest in the current population. The Spearman's correlation analysis was also performed to study the relationship among the traits, and similar results were obtained (Supplementary Table 2).

Genome-Wide Association Study of Grain Quality and Apparent Traits
All the six traits were analyzed using the two GWAS models (i.e., MLM and BLINK) to identify QTLs. Both the PCA and relatedness matrixes were incorporated in the MLM model to reduce the false-positive rate. BLINK model approximates the maximum likelihood using Bayesian Information Criterion in a fixed-effect model to reduce the amount of calculation. Specifically, 10, 6, 2, 7, 11, and 6 loci were detected associated with GT, AC, GPC, PC, LWR, and GV, respectively (Table 1; Figures 4, 5).
Among the 10 loci associated with GT, qGT6-2 located 60 kb away from the reported gene OsSSIIa, which was detected by both methods with a significant p-value (i.e., 2.15E-11 from MLM and 3.52E-10 from BLINK). The qGT2-1 was another significant QTL identified with the BLINK and MLM models in Chromosome 2. Eight more loci were detected only by BLINK. For AC, two  QTLs were detected by both models. The qAC6-1 was identified with an extremely high significant p-value (i.e., 1.57E-22 from MLM and 1.19E-22 from BLINK), which was nearby a reported gene OsGBSSI (Wang et al., 1995). Besides, another novel QTL, i.e., qAC1-2, was also detected by both methods. GPC was another important rice grain quality trait, which determined the nutritional value. Only two QTLs (i.e., qGPC11-1 and qGPC12-1) were identified by the BLINK model, while none of them were detected by the MLM model. A total of 24 loci were detected associated with rice grain appearance traits, with 7, 11, and 6 for PC, LWR, and GV, respectively. Colorless rice is the most consumption type all over the world, while colored rice has been studied and showed its unique activity in reducing the risk of developing chronic diseases, such as cardiovascular disease and Type 2 diabetes (Tantipaiboonwong et al., 2017). Five QTLs (i.e., qPC4-1, qPC7-1, qPC7-2, qPC7-3, and qPC8-1) and two QTLs (i.e., qPC2-1 and qPC4-1) were detected to be associated with the color of the rice seed by the MLM and BLINK models, respectively, while there was no overlapped QTL between the two models. Notably, the qPC7-1 was overlapped with the gene Rc, which was reported regulating the synthesis of proanthocyanidin pigmentation (Furukawa et al., 2007). Different shapes of rice cater to various consumers all over the world. We detected 11 loci tightly associated with rice shape in this study. Five of them (i.e., qLWR3-1, qLWR3-2, qLWR5-1, qLWR5-2, and qLWR7-2) were detected by both models simultaneously. Seed size (GV) was an important factor related to the yield. The bigger size of rice grain was one of the breeding strategies to improve the rice yield. Six QTLs were detected in this study with two methods. The qGV3-2 was identified by the BLINK method with a significant pvalue of 4.41E-09, while qGV5-1 was detected by the MLM model. Four more significant loci were identified on Chromosome 3, 6, 7, 12, respectively.
Above all, a total of nine QTLs (qGT2-1, qGT6-2, qAC1-2, qAC6-1, qLWR3-1, qLWR3-2, qLWR5-1, qLWR5-2, and qLWR7-2) were simultaneously identified by the MLM and BLINK methods associated with GT, AC, and LWR. To further study the favorable SNPs in each QTL, we compared the distribution of leading SNPs. All the results showed significance except qAC1-2 (Figure 6). The qGT2-1 showed a significant association with GT. The minor allele (i.e., the second most common allele, A) of SNP S2_20220266 showed significantly lower GT compared with the major allele (T) in the whole population ( Figure 6A). Besides, the minor allele of S2_20220266 was only found in TEJ and TRJ. The individuals with A had significantly lower GT compared with T from TRJ (Supplementary Figure 2A). The qAC1-2 was identified in association with AC using both the MLM and BLINK methods. The minor allele (G) of SNP S1_3329456 did not show significant differences of GT compared with major allele (A) in the whole population ( Figure 6C). Then, we compared the two alleles in the subpopulations, and the samples with minor  Figure 2B). A total of three novel QTLs (i.e., qLWR3-1, qLWR5-2, and qLWR7-2) were simultaneously associated with LWR in this study. All the leading SNPs of these QTLs displayed significant differences in the whole population (Figures 6E,H,I). Of the 27 accessions carrying minor alleles (G) of SNP S3_5210399, i.e., for SNP S3_5210399, 25 were from TRJ, which was supposed to contribute to slender seed shape (Supplementary Figure 2C). Similarly, minor alleles (A) of SNP S5_28480050 contributed to a slimmer shape of rice in ARO and IND subgroups (Supplementary Figure 2D). Besides, a total of 42 minor alleles (T) were identified in S7_22113010 contributing to LWR, and the minor allele existed in AUS, IND, and TRJ subgroups (Supplementary Figure 2E).

Mining Candidate Genes for Each QTL
All the local LD blocks were defined for each QTL (Supplementary Table 3). Then, the genes in the LD blocks were extracted for further study (Supplementary Table 4). In the local LD blocks, several genes were remapped for target traits, such as OsGBSSI (Os06g0133000) for AC, Rc (Os07g0211500) for PC, GS3 (Os03g0407400) for LWR and GV, and GW5 (Os05g0187500) for LWR and GV. Besides, we also identified missense SNPs in these genes, which corresponded to previous studies (Supplementary Table 5). All the detailed information was listed in Supplementary Table 5.
Among the QTLs associated with LWR, the qLWR7-2 was identified simultaneously by both the MLM and BLINK methods with a significant p-value of 1.10E-09 and 1.11E-09, respectively. Then, a 40.24-kb block was defined ( Figure 7A; Supplementary Table 3), and only eight genes ( Table 2) were located in this region. Six of them were unknown genes, and Os07g0555200 was a translation initiation factor 4G, which resisted to rice tungro spherical virus. Notably, the OsFbox394 (Os07g0555000) was an F-box domain protein. F-box was a big gene family in rice containing 687 members (Jain et al., 2007), which played a crucial role in several biological processes, such as flower development (Duan et al., 2012), leaf senescence (Chen et al., 2013), grain size (Chen et al., 2013), and the development of inflorescence branches and spikelets (Ikeda et al., 2007). In this study, no missense SNP was found in the CDS region of this gene. Then, we investigated the gene expression levels in 9 tissues, and the results showed OsFbox394, which is specifically expressed in rice seed and endosperm tissues (Figure 7B), suggesting this gene might be a regulator in seed development by manipulating the expressing level. The qAC12-1 was a significant (p-value = 1.42E-08) QTL associated with AC identified by the MLM method ( Figure 8A). Only four genes were located in the local LD block of qAC12-1, namely, Os12g0472500, Os12g0472800, Os12g0472900, and Os12g0473900. Among them, OsHyPRP45 (Os12g0473900) was annotated as a protease inhibitor/seed storage/lipid transfer proteins family (Supplementary Table 4). The tissue-specific expression analysis revealed that the expression levels of OsHyPRP45 were enhanced in rice seeding (i.e., three days after sowing) and plumule (i.e., 48 h after emergence) (http://ricevarmap.ncpgr.cn/vars_in_gene/). Five haplotypes were defined based on the six missense SNPs in this gene (Figure 8B), and Hap C showed significantly lower AC compared with Hap B (p-value = 9.9E-09), Hap D (p-value = 1E-11), etc., (Figure 8C). Above all, we suggested OsHyPRP45 was a candidate gene regulating AC in rice grain.

DISCUSSION
A total of 406 accessions were analyzed in this study, which exhibited a diversity of grain quality and appearance. The GT of IND, TEJ, and TRJ was relatively high, which needs to be reduced for catering customers. Only the accessions from TEJ had the preferred AC, while most of the accessions from other subgroups need to be decreased. The GPC was an important nutritional content in rice grain. In this study, the accessions from ARO or TRJ had higher GPC than others, suggesting we might discover key genes regulating GPC from ARO and TRJ. Purple and red rice grains contain anthocyanins and proanthocyanidins, respectively (Furukawa et al., 2007), which could act as antioxidants (Kong et al., 2003;Rauf et al., 2019) and could benefit human health. The accessions from AUS usually displayed red or purple rice grain, while the majority of individuals from other subgroups showed white or light brown color. The grain size and shape of rice influenced each other, and the accessions with slender shapes frequently had a smaller size, such as those from ARO. In contrast, the round-shaped rice generally exhibited a bigger size, including the accessions from TEJ. These wide variations provide potential in improving both grain quality and appearance simultaneously ( Table 3). For example, Ta Mao Tsao (NSFTV155) and Ligerito (NSFTV350) were two accessions with good performances of high GPC, big grain size, desirable GT, and AC, which had the potential of being    basic materials for breeding. The Karabaschak (NSFTV224) was a unique material with a large GV, red color, high GPC, and suitable AC with a higher GT, which could be improved in GT by manipulating the genes controlling it. Another accession, WIR 3764 (NSFTV306), with similar characteristics of Karabaschak along with light brown color, was also needed to decrease GT for better ECQs. A possible approach to decrease the GT of NSFTV224 and NSFTV306 is to replace the allele C at the S6_6687883 with the allele G of the OsSSIIa gene (Figure 6). A TEJ line, R 101 (NSTFV310), with a high GPC, medium size, suitable GT, and low AC, which could be improved in grain size and AC by the key genes GS3, changes allele T at S3_16733441 to allele G (Fan et al., 2006) and OsGBSSI (changes allele T at S6_1765761 to allele G) (Wang et al., 1995).
In this study, we performed GWAS to identify QTLs associated with GT, AC, GPC, PC, LWR, and PC, and many of them were overlapped with previously reported genes. The qGT6-2 was only 60 kb from the OsSSIIa gene, and qAC6-1 was located inside the OsGBSSI gene. Rc was also remapped in this study, associated with PC (qPC7-1). GS3 (Fan et al., 2006) and GW5 (Weng et al., 2008) were the well-studied genes regulating grain size, grain length, and grain width. These two genes were also detected related to LWR (i.e., qLWR3-2 and qLWR5-1) and GV (i.e., qGV3-2 and qGV5-1). Besides, OsRPH1 was reported associated with plant height, grain length, width, and thickness in rice, which was covered by qLWR5-3 in this study. In addition, the key missense SNPs were also identified in this study (Supplementary Table 5). Above all, we indicated the efficiency and accuracy of this study. Besides, only two QTLs were identified related to GPC. By comparing the GPC based on the SNP in the whole population, we found that both loci exhibited significant differences (p = 9.75E-05 for qGPC11-1 and p = 0.0011 for qGPC12-1) for GPC (Supplementary Figures 1A,C). For S12_2673221, individuals who carried T exhibited higher GPC than those with T in both IND (p < 0.01) and TRJ (p < 0.001) subpopulations, which had similar trends in the whole population (Supplementary Figures 1C,D). While for qGPC11-1, the materials that carried G showed significantly higher GPC compared with those with A, which was opposite to the whole population. Thus, this site might be a unique QTL controlling GPC in the Japonica subspecies.

CONCLUSION
This study has confirmed a wide variation of grain quality and appearance in the RDP1 accessions and identified a few germplasm lines superior to GT, AC, GPC, PC, LWR, and GV for rice grain quality and appearance improvement. A total of 19 and 24 loci associated with grain quality and appearance were identified, and nine of them were simultaneously identified by both the MLM and BLINK methods. Significantly, a candidate gene, OsFbox394, regulating the seed development was discovered in qLWR7-2, and OsHyPRP45 was a candidate gene manipulating AC in rice grain. Further experiments are required to validate the function of these genes. Above all, this study provides basic information for further studies on genetic and molecular biology on grain quality and appearance in rice.

DATA AVAILABILITY STATEMENT
The original contributions generated for the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.