Association Analysis of Markers Derived from Starch Biosynthesis Related Genes with Starch Physicochemical Properties in the USDA Rice Mini-Core Collection

Rice eating and cooking quality is largely determined by starch physicochemical properties. The diverse accessions in the USDA rice mini-core collection (URMC) facilitate extensive association analysis of starch physicochemical properties with molecular markers specific to starch biosynthesis related genes. To identify significant trait-marker associations that can be utilized in rice breeding programs for improved starch quality, we conducted two association analyses between 26 molecular markers derived from starch biosynthesis related genes and 18 parameters measured of starch physicochemical properties in two sets of the mini-core accessions successfully grown in two environments in China. Many significant trait-marker associations (P < 0.001) were detected in both association analyses. Five markers of Waxy gene, including the (CT)n repeats, the G/T SNP of intron 1, the 23 bp sequence duplication (InDel) of exon 2, the A/C SNP of exon 6, and the C/T SNP of exon 10, were found to be primarily associated with starch traits related to apparent amylose content (AAC), and two markers targeting the 4,329–4,330 bp GC/TT SNPs and 4,198 bp G/A SNP of SSIIa gene were mainly associated with traits related to gelatinization temperature (GT). Two new haplotypes were found in the mini-core collection based on the combinations of the 23 bp InDel and three SNPs (G/T of intron 1, A/C of exon 6, and C/T of exon 10) of Waxy gene. Furthermore, our analyses indicated that the (CT)n polymorphisms of Waxy gene had a non-negligible effect on AAC related traits, as evidenced by significant variation in AAC related traits among rice accessions with the same Waxy SNPs but different (CT)n repeats. As the five Waxy markers and the two SSIIa markers showed consistent major effects on starch quality traits across studies, these markers should have priority for utilization in marker-assisted breeding.


INTRODUCTION
As the major component in rice grain, starch accounts for over 90% of endosperm weight, and thus the physicochemical properties of starch have foremost influence on rice eating and cooking quality. Two types of starch polymers exist in rice endosperm: amylose and amylopectin. Amylose is linear with few branches while amylopectin is highly branched. Amylose content and the fine structure of amylopectin are the main determinants of rice eating and cooking quality (Juliano, 1985;Bao et al., 2009;Syahariza et al., 2013;Kong et al., 2015). Apparent amylose content (AAC), gelatinization temperature (GT) and gel consistency had been established and still routinely used for grain quality evaluation (Juliano, 1985). Additional indicators have also been developed for more precise evaluation of grain quality, such as pasting viscosity profile, thermal, and retrogradation properties. All these starch physicochemical properties could be grouped as AAC-related traits and GT-related traits according to previous studies (Wang et al., 2007;Yang et al., 2014).
Starch biosynthesis in rice is affected by a concerted network of many enzymes. Generally, four classes of enzymes are involved in starch biosynthesis. They are adenosine diphosphate pyrophosphorylase (AGPase), starch synthase (SS), starch branching enzymes (SBEs), and starch de-branching enzymes (DBEs) (Martin and Smith, 1995;Smith et al., 1997). Among them, amylose biosynthesis is primarily controlled by granule bound starch synthase-I (insoluble starch synthase; He et al., 1999) while amylopectin production is a teamwork of SS (soluble starch synthase), SBEs and DBEs with distinct roles. SS catalyzes the transfer of the glucosyl moiety of ADP-glucose to the reducing end of pre-existing α-1,4 linked glucan chains; SBEs cleave α-1,4 chains and transfer them to C 6 hydroxyls to form the branches α-1,6 chains. DBEs remove improperly branched α-1,6 chains within branched clusters (Nakamura, 2002;Nakamura et al., 2010).
The complexity of the process of rice starch biosynthesis is increased by the fact that those enzymes all have isoforms encoded by different genes. The AGPase has four large (AGPL1-4) and two small (AGPS1, AGPS2) catalytic subunits (Lee et al., 2007). A total of 10 isoforms in five types of starch synthase enzymes were reported: GBSS (I, II), SSI, SSII (SSIIa, SSIIb, SSIIc), SSIII (SSIIIa and SSIIIb), SSIV (SSIVa and SSIVb; Tatsuro and Tomio, 2004). Three isoforms of SBE: SBEI, SBEII (SBEIIa, SBEIIb; Yamanouchi and Nakamura, 1992) and two types of DBE: isoamylase and pullulanase have been reported (Nakamura et al., 1996;Fujita et al., 1999). The nucleotide sequence variation in these starch biosynthesis related genes can affect both amount and function of corresponding enzymes, and consequently the content and fine structure of amylose and amylopectin will be changed, leading to variation in starch physicochemical properties (Wang et al., 1995;Ayres et al., 1997;Larkin and Park, 2003;Umemoto and Aoki, 2005;Waters et al., 2006;Yu et al., 2011).
Association analysis is a powerful method to dissect the relationship between gene sequence polymorphisms and phenotypic variations (Thornsberry et al., 2001;Gupta et al., 2005). Many association analyses have been conducted on rice landraces and cultivars from different germplasm resources (see Table 1 in Zhang et al., 2016), but most of these studies are centered on agronomic traits in relation to yield, flowering time and disease resistance. Relatively few association analyses investigated starch physicochemical properties. Moreover, most of the studies were conducted on a limited number of starch traits using less diverse rice germplasm or on a limited number of molecular markers or markers that are not related to starch synthesis, which could make their findings unrepeatable. For example, Ayres et al. (1997) reported a very strong relationship between Waxy microsatellite alleles and AAC, but later studies confirmed this was largely due to the narrow genetic base in the employed samples (Bergman et al., 2000). As many markers derived from genes involved in starch biosynthesis are now available, it is necessary to investigate the utility of these markers with diverse rice germplasm for a better understanding of the genetic basis of starch quality traits.
The USDA rice mini-core collection (URMC) consists of 217 accessions that were selected to represent over 18,000 accessions in the USDA global gene bank of rice (Agrama et al., 2009). It has been used in QTL mapping for genes associated with grain yield and other agronomic traits, disease resistance, or protein concentration (Li et al., 2011Jia et al., 2012;Bryant et al., 2013). In this study, we conducted two association analyses to investigate the relationships between 26 molecular markers derived from 18 starch biosynthesis related genes and 18 parameters of starch physicochemical properties measured for diverse URMC accessions successfully grown in two environments in China. Our aim was to compare the importance of these molecular markers in determining starch traits and to find the most efficient markers or marker combinations/haplotypes that can be used in developing rice cultivars with improved cooking and eating quality via markerassisted breeding.

Plant Materials
Samples of 217 accessions from the USDA rice mini-core collection were provided by U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS). The accessions were grown in Hainan (18 • N) from December 2013 to April 2014, and in Hangzhou (18 • N) from June to October in 2014, in a randomized block design with two replications within each environment. In each replicate, two rows and six plants per row for each accession were planted at a spacing of 20 cm between and within rows. However, only 160 accessions in Hainan and 155 in Hangzhou produced enough seeds for measurement of starch traits.

Measurement of Starch Physicochemical Properties
The rice seeds harvested from the two environments Hainan and Hangzhou were measured for 18 parameters of starch physicochemical properties as described in Li et al. (2017). The AAC was determined by a colorimetric method. Gel texture Frontiers in Plant Science | www.frontiersin.org properties including hardness (HD), adhesiveness (ADH), cohesiveness (COH) were measured by TA-XT2i Texture Analyzer (Texture Technologies Corp., Scarsdale, NY). RVA pasting profile was determined by a Rapid Visco Analyzer (RVA, Model 3-D, Newport Scientific, Warriewood, Australia), and the measured parameters include peak viscosity (PV), hot paste viscosity (HPV), cool paste viscosity (CPV), peak time (Ptime), pasting temperature (PT) and three derivative parameters: breakdown (BD = PV-HPV), setback (SB = CPV-PV), and consistency (CS = CPV-HPV). Thermal properties were measured using a DSC 2,920 thermal analyzer (TA Instruments, Newcastle, DE, USA), and the measured parameters include onset temperature (T o ), peak temperature (T p ), conclusion temperature (T c ), enthalpy of gelatinization ( Hg), width at half peak height ( T 1/2 ), and percentage of retrogradation (R%). All these parameters were measured in duplicate, and the mean values of the 18 measured parameters were used for association analysis.
Wide variation was observed in all 18 parameters measured of starch physicochemical properties in both sets of rice samples successfully grown in Hainan and Hangzhou Supplementary Table 1). For example, AAC ranged from 1.1 to 29.4% and averaged 20.6%, with CV of 35.44% in the 160 accessions harvested from Hainan. A similarly high level of variation in AAC was shown in the 155 accessions from Hangzhou. High genetic diversity was also observed in other parameters of starch physicochemical properties, such as gelatinization temperature (T p ) which ranged from 66.7 to 81.0 • C and from 66.1 to 82.3 • C in the two sets of samples respectively. These parameters were used in the present association analysis to identify marker-trait relationships.

DNA Isolation and Genotyping
Whole genomic DNA was extracted from five seedlings of each accession using the CTAB method of Doyle (1991).
Twenty-five molecular markers derived from 18 starch biosynthesis related genes were used for genotyping all 217 rice accessions for this study (Supplementary Table 2).
PCR reaction was performed in a 10 µl reaction mixture containing 20 ng of template DNA, 1X PCR buffer, 2 mM MgCl 2 , 0.2 mM dNTPs, 0.2 µM of each primer and 1 unit of Taq DNA polymerase. All amplifications were performed on a PTC-100 thermal cycler (MJ Research, Inc.) under following conditions: 5 min at 94 • C, followed by 45 s at 94 • C, 60 s at T A (Supplementary Table 2), 60 s at 72 • C for 35 cycles, and 7 min at 72 • C for a final extension.
The PCR products of InDel and STS markers were either resolved on 2.0% agarose gel containing 0.05 µl/mL gel red in 1X TBE buffer and visualized using a gel documentation system, or separated on 8% polyacrylamide denaturing gels and visualized by silver staining (Bassam et al., 1991).
PCR products of SNP markers were digested with restriction endonucleases (New England BioLabs; Supplementary Table 2) according to manufacturer's instructions. The digests were separated on 8% polyacrylamide gel and visualized by silver staining.

Statistical Analysis
A total of 26 molecular markers were used in the association analysis. In addition to the 25 marker loci we genotyped for the 217 URMC accessions, the (CT) n polymorphism of marker RM190 of Waxy gene reported in Li et al. (2010) was included in the final data analysis.
Data analysis was performed using the SAS system for windows version 8 (SAS Institute Inc., Cary, NC, USA). The Student-Newman-Keuls test was conducted for comparison of means at P < 0.05, and PROC GLM was used for analysis of variance determination.
PowerMarker (Version 3.25) was used to calculate the polymorphism information content (PIC) for the 26 markers and Nei's distance (Nei et al., 1983) between rice accessions. Based on Nei's distance, UPGMA was performed and the tree was viewed in MEGA 7.0.

Association Analysis
Using the same set of 26 molecular markers and the mean values of 18 measured parameters of starch traits, we conducted two separate association analyses for the 160 rice accessions grown in Hainan and 155 accessions grown in Hangzhou, respectively, using the mixed linear model (MLM) of TASSEL (Version 2.1) which takes both population structure (Q) and kinship (K) into account, and rare alleles (frequency <5%) were removed before association analysis. We estimated the parameters of Q and K based on 155 SSR markers reported in Li et al. (2010). The detection of marker-trait association was determined by the p-value (marker) (p < 0.01).

Allelic Diversity at the Marker Loci from 18 Starch Biosynthesis Related Genes
All the 26 gene specific markers were polymorphic in the 217 mini-core accessions, with polymorphic information content (PIC) ranging from 0.09 (marker SSIVb-Indel) to 0.85 (marker RM190; Table 1).
Following Li et al. (2010), the 217 accessions in the minicore were divided into seven ancestry groups, and the PIC of the 26 markers in each group was calculated in the present study ( Table 1). The "Admix" group, which refers to the rice accessions having the mixed ancestry (such as the TEJ-TRJ mixture, and the IND-AUS mixture), had two or more alleles at each of the 26 marker loci whereas allele fixation occurred at some of the loci in the other six groups. Allele fixation occurred at 14 marker loci in the ARO group, likely due to a small number of accessions (n = 6) belonging to the group. In contrast, only six loci were fixed in the wild rice (the WD group, n = 12). Compared to the ancestry groups TEJ, IND, and AUS, the TRJ group showed much lower PIC for four markers: 23 bp InDel of exon 2, A/C SNP of exon 6, and C/T SNP of exon 10 in Waxy gene, and the GC/TT SNPs in SSIIa.
UPGMA analysis based on Nei's genetic distance was used to visualize genetic relationships among the 217 accessions (Figure 1). Two major clusters were observed, one consisting of nearly all accessions of ancestry groups AUS, IND, and WD, and FIGURE 1 | UPGMA tree of 217 access in USDA rice mini-core collection based on Nei's genetic distance calculated from 26 markers derived from 18 starch biosynthesis genes. The seven ancestry groups include IND indica, AUS aus, TEJ temperate japonica, TRJ tropical japonica, ARO aromatic, WD wild rice, and Admix accessions with mixed ancestry (classified according to Li et al., 2010 the other consisting of nearly all accessions of ancestry groups TEJ, TRJ, and ARO. Within each of the two major clusters, however, the accessions of the same ancestry group could be placed into several subgroups. As it might be expected, the accessions of Admix group spread into many subgroups both within and between the two major clusters.

Association Analysis
Results of the two marker-trait association analyses for the 160 rice accessions grown in Hainan and 155 accessions grown in Hangzhou were shown in Table 2. The common markers detected in both association analyses were all from Waxy gene and SSIIa gene. The Waxy-exon 2 marker was consistently detected as having association with nearly all parameters of starch physicochemical properties (except for thermal properties). The common markers detected for the parameters of pasting viscosity (PV, HPV, CPV, SB, BD, and Ptime) were all from Waxy gene. For parameters of thermal property, such as T o , T p , and T c , the common markers detected were all from SSIIa gene. For HD and ADH of gel texture and AAC, markers from both Waxy gene and SSIIa gene were detected. Interestingly, Waxy-exon 10 marker was found to have association with only HPV and CPV (p < 0.0001) in both analyses. The major difference between the two analyses was in P-values (p_Marker in Table 2). For example, association between Waxyexon 2 and AAC was detected with P-value of 1.08 × 10 −21 in the Hainan samples, but a much lower P level (1.06 × 10 −8 ) was found in the Hangzhou samples for the same association. There were also some other differences between the two analyses. For example, the Waxy-exon 10 and HD association (3.11 × 10 −4 ) was detected only in the Hainan samples, suggesting a sampling effect in the marker-trait association analysis.
In order to detect other markers that could differentiate trait variation under the same Waxy or SSIIa marker background, further association analyses were conducted with the five Waxy markers [(CT) n repeats, G/T SNP in intron 1, 23 bp InDel in exon 2, A/C SNP in exon 6, C/T SNP in exon 10] and two SSIIa markers (G/A SNP at 4198 bp and GC/TT SNPs at 4,329-4,330 bp) set as covariates, respectively ( Table 2). Under the Waxy background, associations were found between markers from SSIIa gene and parameters PT, T o , T p , and T c in both analyses. In contrast, few common markers were found for other starch traits. Only SBEIIb A/G and SSIIIb AG/AGAG were found for HPV and SB respectively in both analyses. Under the SSIIa background, the Waxy-intron 1 marker was detected only for two parameters T o and T p , but the associations of Waxy-exon 2 marker were detected with many parameters, such as AAC, HD, ADH, COH, PV, HPV, CPV, CS, Ptime, and PT. In addition, the associations of Waxy-exon 10 with HPV and CPV were detected in both analyses.

Waxy Haplotypes in the Rice Accessions Grown in Hainan
A total of nine Waxy haplotypes were identified in the 160 accessions grown in Hainan based on the allelic combinations of the three SNPs and the 23 bp InDel in Waxy gene (Figure 2A). There were 137 accessions falling into T1AC, G1CC, G1AC, and G1AT haplotype groups. The G1AC group consisted of most accessions (53 accessions) while the other three haplotypes had similar numbers of accessions (from 25 to 30 accessions). Two haplotypes (T1AT and T1CT) among the nine had not been discovered before. However, these haplotypes may be rare in nature, as they could only be found in a small number of accessions: six accessions with the T1AT haplotype, and only one accession with the T1CT haplotype (Supplementary Table 3).
The associations of SNP combinations (including the 23 bp Indel in exon 2) and the (CT) n classes of Waxy gene were shown in Figure 2B. The T2AC and G2AC haplotypes (harboring a 23 bp duplication in exon 2) were both associated with (CT) 16 or (CT) 17 . Most rice (20 out of 25) with the G1AT haplotype were associated with the (CT) 11 allele. The G1AC haplotype was mainly associated with short (CT) n repeats of 8, 10, 11, and long (CT) n repeat of 18; The T1AC haplotype was mainly associated with short (CT) n repeats of 8, 10, and long (CT) n repeat of 18. Rice accessions with G1CC haplotype were associated with (CT) n groups of 14, 16, 17, 18, 19, and 20. Other rare haplotypes were found in several (CT) n groups.

Means and Ranges of AAC Related Traits in Rice Accessions with Different Waxy Haplotypes
The accessions with haplotypes T2AC and G2AC had significant lower mean AAC ( Table 3). The G1AT and G1AC groups all had a mean AAC higher than 23% while the mean AAC of T1AC and G1AC were all <23% but higher than 18%. Three accessions with T1AT haplotype were associated with short (CT) n of 8, 11, 12 repeats, and they had AAC at 26.6%, 26.0% and 24.3% respectively.
For parameters HD, ADH and COH of gel texture, the variations of mean values among haplotype groups were highly correlated with that of AAC. The groups having higher mean AAC also had higher mean HD, COH and lower mean ADH (Table 3). However, there was no strong relationship for parameters of pasting viscosity. The G1AT haplotype with highest mean AAC showed highest mean values for all viscosity parameters, but G1AC haplotype which had the second highest mean AAC showed the lowest mean value for PV and HPV, suggesting a significant effect of exon 10 C/T SNP on PV and HPV. On the other hand, the fluctuation of mean values of CPV, CS, and SB followed the trend of AAC. Ptime did not differ significantly among haplotypes.

The Effect of RM190 SSR Polymorphism on AAC Related Traits
The association between AAC and RM190 polymorphism was visualized in Figure 3. Generally, the shorter repeats of (CT) 8 , (CT) 10 , and (CT) 11 were primarily associated with high AAC (AAC > 23%), and the range of AAC in these three (CT) n classes was rather narrow. In contrast, the longer (CT) n repeats of 17, 18 covered a wide span of AAC types, primarily ranging from 0 to 23%. For (CT) 14 , (CT) 16 , and (CT) 20 groups, nearly all accessions had AAC < 23% but the numbers of accessions in these groups were small. The other parameters of AACrelated traits also differed significantly among (CT) n classes (Supplementary Table 4). Since rice accessions with T1AC and G1AC haplotypes were both fairly evenly distributed in (CT) 8 , (CT) 10 , and (CT) 18 classes, mean values of AAC-related traits were compared between accessions with same SNP combinations but different (CT) n polymorphisms ( Table 4). The results showed that, compared to G1AC haplotype, T1AC haplotype was lower in mean values of AAC, HD, CS, SB, and higher in ADH, PV and BD only in the (CT) 18 group. In contrast, no traits differed significantly between T1AC and G1AC haplotypes in either (CT) 8 or (CT) 10 classes. Furthermore, within T1AC and G1AC, the (CT) 18 class differed significantly in several parameters when compared with the other two groups. This suggested Waxy (CT) n microsatellite polymorphisms had a non-negligible effect on AAC-related traits.

Analysis of Variance in AAC Related Traits
The variances of AAC-related traits explained by each polymorphic site of Waxy gene and their combinations ( Table 5) showed that for nearly all parameters, the (CT)n microsatellite polymorphism explained the largest portion of the total variance (R 2 = 0.153 − 0.631), followed by the 23 bp InDel of exon 2. The intron 1 G/T SNP alone only explained 16.4% of total variance in AAC (P < 0.0001). The exon 6 A/C SNP caused limited variance for all parameters, whereas the exon 10 C/T SNP was greatly responsible for variances in PV, HPV and CPV (R 2 = 0.124 − 0.298; P < 0.0001). Interestingly, the SNP combinations explained less variance than RM190 for PV, HPV, SB, BD, HD, ADH, and COH. The three SNP sites together with the 23 bp InDel explained 52.8% of variance in AAC (P < 0.0001), but when RM190 was included in the analysis, all the markers together explained as much as 82.0% of the variance  (P < 0.0001). The effect of RM190 on AAC determination was further highlighted in non-waxy rice (150 accessions) where all the SNPs together only explained 14.6% (P < 0.001) of the variance, but RM190 alone could explain up to 48.7% (P < 0.0001, Supplementary Table 5). It should be noted, however, the portion of variance of AAC explained by each of the markers and their combinations varied greatly among different sample groups (Supplementary Table 5).

GT related Traits and SSIIa Alleles in 160 Rice Accessions Grown in Hainan
Based on the SNP sites of SSIIa gene, three haplotypes were identified. Out of the 160 accessions harvested from Hainan, 124 had the GC-G haplotype (GC SNPs at 4,329-4,330 bp sites and G SNP at 4,198 bp site), indicating this was the most frequent haplotype in this set of accessions. The other two haplotypes had fewer accessions, with 24 (TT-G) and 12 (GC-A) accessions respectively ( Table 6). Gelatinization temperature ranged from 67.2 to 81 • C, 67 to 76.7 • C, and 66.7 to 73.4 • C in haplotypes GC-G, TT-G and GC-A respectively. The GC-G haplotype had significant higher mean PT, T o , T p , and T c but lower T 1/2 than the other two haplotypes. The variance explained by the SSIIa SNPs ranged from 20.6% to 60.4% for these five parameters ( Table 7). On the other hand, Hg and R% did not differ significantly among these SSIIa haplotypes (Table 6).

DISCUSSION
Rice is the most important food crop in Asia, and it feeds about half of human population world-wide. Phenotypic variation in natural and cultivated rice and the genetic basis underlying complex traits have been the focus of extensive research. With the advance of genome-wide association analysis, candidate genes or genetic architecture associated with important traits, such as rice grain size and weight, flowering time, disease resistance, abiotic stress tolerance, and metabolites have been the focus of recent studies (e.g., Huang et al., 2010Huang et al., , 2012Famoso et al., 2011;Zhao et al., 2011;Chen et al., 2014). However, genome-wide association studies to identify genes or gene markers for starch quality traits are relatively few. The USDA rice mini-core collection consists of 217 accessions originated from 76 countries and more than 14 geographical regions worldwide. These diverse accessions are valuable for making  new discoveries about genetic determinants of starch quality traits.

Genetic Variation among Ancestry Groups in the USDA Rice Mini Core Collection
As shown in Figure 1, the 26 markers derived from 18 starch biosynthesis related genes could separate the seven ancestry groups in the mini core rice, as classified by Li et al. (2010), into two deeply divided clusters, but they are not sufficiently diverse to separate the ancestry groups within each cluster. This pattern of genetic divergence could provide an explanation for the conflicting findings about rice starch physicochemical properties in some previous studies. For example, some studies reported that AAC was positively correlated with PV whereas others found that the two parameters were negatively correlated or had not a strong relationship (Caffagni et al., 2013;Yang et al., 2014;Kong et al., 2015). This kind of discrepancy is probably due to different sample constitutions in different studies. As shown in the present study (Table 3), due to the effect of exon 10 C/T SNP, high AAC rice could have high or low PV. The relationship between AAC and PV thus could be different depending on the percentage of different Waxy gene haplotypes in the germplasm used by different studies. On the other hand, the importance of a marker on certain phenotypic trait could be overestimated due to a narrow genetic base of the samples, as shown in Ayres et al. (1997). Furthermore, the effect of markers from minor genes may be limited to certain populations (Lu and Park, 2012). Taking these factors into consideration, we performed two association analyses on a large number of accessions planted in two environments in order to find the markers with potential to discriminate phenotypic variations in diverse rice germplasm contained in the mini-core collection. The marker-trait associations identified in both analyses were considered to be reliable and strong associations.

Marker-Trait Association Identified in this Study
The genetic basis of rice starch physicochemical properties has been studied previously. Among over twenty starch biosynthesis related genes, Waxy gene was reported as the primary gene responsible for amylose content by multiple QTL mapping studies (He et al., 1999;Tan et al., 1999;Septiningsih et al., 2003;Xu et al., 2015). In the previous association analyses, a total of five polymorphic sites in Waxy gene were reported as having important effects on AAC variation (Ayres et al., 1997;Larkin and Park, 2003;Wanchana et al., 2003;Chen et al., 2008;Asante et al., 2013;Caffagni et al., 2013). In our association analyses, we included five markers specific to all the five polymorphic sites (G/T SNP of intron 1, 23 bp InDel of exon 2, G/C SNP of exon 6 and C/T SNP of exon 10, and (CT) n repeats) in order to compare their relative importance. We found that the 23 bp InDel in exon 2 and (CT) n microsatellite polymorphisms were most strongly associated with AAC while intron 1, exon 6, and exon 10 were less significant as suggested by the P-values ( Table 2), and these findings were confirmed by both association analyses.   The SBEIIb C/G SNP was reported to be significantly associated with amylose content and viscosity properties (Lu and Park, 2012). However, it should be pointed out that this association was only found for certain populations in the previous study. The present study included diverse rice accessions of different ancestry, but the purported associations involving SBEIIb C/G were not found. Similarly, several markers were identified having association with some starch traits, but the associations were not consistently found for both Hainan and Hangzhou samples. This type of inconsistency suggested that compared to Waxy and SSIIa genes, these markers may have minor or environment-dependent effects on starch traits. For further evaluation of their roles in determining starch characteristics, constructing near-isogenic-lines could be a suitable approach, as demonstrated in recent reports (Luo et al., 2015;Fan et al., 2017).
The SSIIa gene was identified as the major gene responsible for GT variation in several QTL mapping studies (He et al., 1999;Tan et al., 1999;Tian et al., 2005;Umemoto and Aoki, 2005). It is not surprising that the association analyses performed in the present study found that the SSIIa markers were highly associated with thermal properties, such as that the GC allele of SSIIa gene was usually associated with high or intermediate gelatinization temperature (GT) while the TT allele was usually associated with low GT, in agreement with the previous report, e.g., Bao et al. (2006b). The SNP A at the 4198 bp site inactivated the enzyme and thus resulted in low GT regardless of which allele (GC/TT) was present at the 4,229-4,330 bp sites (Nakamura et al., 2005). However, we found that six accessions in the GC-G group had GT lower than 72 • C, and three accessions in the TT-G group and one in the GC-A group had GT higher than 72 • C. These exceptions indicate that factors influencing GT of rice are complex which warrants further investigations.
In addition, our analysis showed that the SSIIa markers were highly associated with AAC-related traits (AAC, HD, etc.). Recently, Fan et al. (2017) also detected the effect of SSIIa gene on AAC and viscosity parameters using near-isogenic lines. Thus, the effect of SSIIa gene on AAC-related traits could not be purely attributed to gene linkage effect although the SSIIa GC SNPs were found associated with the G SNP of intron 1 in Waxy gene in most cases in this study (103 out of 137, 75.2% percent).
Previous QTL mapping studies reported that paste viscosity and gel texture were controlled by Waxy gene (Bao et al., 2000) or SSIIa gene (Wang et al., 2007). The present study showed that for the parameters of AAC-related traits, the highly associated markers were mainly from Waxy gene (P < 0.001). For these starch parameters, limited markers from other genes could be detected at P < 0.01 under Waxy background. In contrast, Waxy markers prevailed under SSIIa background. These findings suggest that Waxy gene plays a dominant role in determining AAC-related traits. For GT-related traits, the GC/TT SNPs of SSIIa were highly associated with pasting temperature in both Hainan and Hangzhou samples. This association was still significant even under Waxy background. Interestingly, although Waxy exon 10 C/T SNP was not detected as playing an important role in determining AAC, it was highly associated with HPV and CPV in both Hainan and Hangzhou environments ( Table 2). Traore et al. (2011) postulated that exon 10 SNP varied viscosity parameters by causing a proportional change of insoluble to soluble amylose. However, as Hanashiro et al. (2008) reported that Waxy gene is also responsible for biosynthesis of the extralong unit chains of amylopectin in rice which is correlated with viscosity parameters (Han and Hamaker, 2001;Horibata et al., 2004;Inouchi et al., 2005), the putative variation in amylopectin structure between rice with exon 10 C/T SNP should be investigated as another possible cause for variation in viscosity parameters in future studies.

New Waxy Haplotypes Identified in USDA Rice Mini-Core Collection
The 160 rice accessions harvested from Hainan were further analyzed for relationship between Waxy haplotypes and variation in AAC related traits. Based on the three SNPs and the 23 bp duplication of Waxy gene, we identified a total of nine Waxy haplotypes, and two of them (T1CT and T1AT) are reported for the first time. These two new haplotypes were found in seven rice accessions that all had a T SNP in intron 1 of Waxy gene, but four of them had AAC > 18%. Previously, the G to T mutation was reported leading to a reduced efficiency of GBSS pre-mRNA processing, which subsequently resulted in a lower level of spliced mature mRNA and lower AAC (Bligh et al., 1998;Cai et al., 1998;Hirano et al., 1998;Isshiki et al., 1998). Ayres et al. (1997) sequenced 42 cultivars and found all the cultivars with amylose content <18% had the T SNP at the intron 1 putative 5 ′ splice site. Larkin and Park (2003) reported the same results in 14 rice accessions. Chen et al. (2008) measured 53 rice accessions with this T SNP, and nearly all of them showed AAC < 18%. However, several studies reported some rice cultivars with the TAC waxy haplotype had AAC > 18% (Dobo et al., 2010;Asante et al., 2013;Caffagni et al., 2013;Biselli et al., 2014). Our present result also showed that the T SNP in intron 1 of Waxy gene is not always accompanied by an AAC lower than 18% (Table 3). However, the mechanism of how the SNPs of exon 6 and exon 10 affect GBSS function and starch characteristics remains unclear. It could be due to the amino acid substitutions caused by the two SNPs (Larkin and Park, 2003) that in theory can lead to structural differences affecting GBSS activity and hence starch characteristics. The diverse Waxy gene haplotypes identified in the mini-core collection may provide suitable materials for further studies to better understand the structural and functional differences among various GBSS enzymes.  *,**,***,**** indicate significance at P < 0.05, 0.01, 0.001, and 0.0001 respectively.

The (CT)n Repeats of Waxy Gene has a Non-negligible Effect on AAC Related Traits
Among the 14 (CT) n classes, the most frequent were (CT) 8 , (CT) 10 , and (CT) 18 . The ranges of AAC in rice accessions of the (CT) 8 and (CT) 10 classes were quite narrow whereas a wide range of AAC was presented in rice of the (CT) 18 class. To further examine the effect of (CT) n polymorphism on amylose content, we compared rice accessions with the same SNP combinations but different (CT) n repeats. The results showed that T1AC rice had significant lower mean AAC than G1AC rice only in the (CT) 18 class. Furthermore, when coupled with (CT) 18 , both G1AC and T1AC rice had significantly lower mean AAC than rice with the same SNPs but coupled with (CT) 8 and (CT) 10 repeats. This indicated an interaction between SNPs and (CT) n polymorphisms in determining AAC. In addition, our ANOVA results showed that a larger portion of variance of AAC could be explained when RM190 was included in the analysis ( Table 5), indicating that RM190 also has a non-negligible effect on AAC. Some previous studies suggested that the SNPs of Waxy gene had a superior effect on AAC, hence a priority on SNPs was recommended for utilization in marker-assisted selection (Chen et al., 2008;Dobo et al., 2010;Asante et al., 2013). However, as shown in the present study (Supplementary Table 5), the portion of AAC variance explained by SNPs varied considerably among different ancestry groups, and in some groups, Waxy SNPs explained far less variance than RM190. The level of variation in AAC explained by SNPs, RM190 and their combinations varied notably among studies. Using the combination of RM190 and the G/T in intron 1, Ayres et al. (1997) explained 85.9% of the variation in AAC. Dobo et al. (2010) found that 93.8% of the variation in AAC could be explained by the combination of RM190 and the SNPs in intron 1, exon 6 and exon 10. In contrast, both Caffagni et al. (2013) and Biselli et al. (2014) reported a lower level of explained variation with the same markers.

CONCLUSION
Using diverse rice accessions in the USDA mini-core collection, we conducted extensive association analyses between 26 molecular markers derived from 18 starch biosynthesis related genes and 18 parameters measured of starch physicochemical properties. We identified many significant marker-trait associations. Furthermore, we investigated marker combinations or haplotype diversity for Waxy and SSIIa genes in the mini-core rice and their effects on starch traits. In addition to the haplotypes reported in previous studies, we discovered two new haplotypes of Waxy gene in seven mini-core accessions. These accessions provided new opportunities for further studies of associations between Waxy haplotypes, GBSS enzyme activity and starch quality traits. Compared to Waxy SNPs, our analyses showed the (CT) n repeat polymorphism also played an important role in determining AAC related traits, thus should not be overlooked in marker-assisted selection (MAS). These findings can help rice breeder develop varieties with improved starch quality via MAS.

AUTHOR CONTRIBUTIONS
MS initiated the project; KL performed research; JB and HC contributed essential expertise and input in phenotypic and genetic analysis. All authors contributed to the paper writing and approved the final version to be published.