Whole-Genome Analysis of Candidate genes Associated with Seed Size and Weight in Sorghum bicolor Reveals Signatures of Artificial Selection and Insights into Parallel Domestication in Cereal Crops

Seed size and seed weight are major quality attributes and important determinants of yield that have been strongly selected for during crop domestication. Limited information is available about the genetic control and genes associated with seed size and weight in sorghum. This study identified sorghum orthologs of genes with proven effects on seed size and weight in other plant species and searched for evidence of selection during domestication by utilizing resequencing data from a diversity panel. In total, 114 seed size candidate genes were identified in sorghum, 63 of which exhibited signals of purifying selection during domestication. A significant number of these genes also had domestication signatures in maize and rice, consistent with the parallel domestication of seed size in cereals. Seed size candidate genes that exhibited differentially high expression levels in seed were also found more likely to be under selection during domestication, supporting the hypothesis that modification to seed size during domestication preferentially targeted genes for intrinsic seed size rather than genes associated with physiological factors involved in the carbohydrate supply and transport. Our results provide improved understanding of the complex genetic control of seed size and weight and the impact of domestication on these genes.

Seed size and seed weight are major quality attributes and important determinants of yield that have been strongly selected for during crop domestication. Limited information is available about the genetic control and genes associated with seed size and weight in sorghum. This study identified sorghum orthologs of genes with proven effects on seed size and weight in other plant species and searched for evidence of selection during domestication by utilizing resequencing data from a diversity panel. In total, 114 seed size candidate genes were identified in sorghum, 63 of which exhibited signals of purifying selection during domestication. A significant number of these genes also had domestication signatures in maize and rice, consistent with the parallel domestication of seed size in cereals. Seed size candidate genes that exhibited differentially high expression levels in seed were also found more likely to be under selection during domestication, supporting the hypothesis that modification to seed size during domestication preferentially targeted genes for intrinsic seed size rather than genes associated with physiological factors involved in the carbohydrate supply and transport. Our results provide improved understanding of the complex genetic control of seed size and weight and the impact of domestication on these genes.

INTRODUCTION
A growing world population and an increase in affluence is driving demand for agricultural products, especially cereals, which supply more than 75% of the calories consumed by humans (Sands et al., 2009). With limited arable land and water resources, particularly in Sub-Saharan Africa where sorghum is a staple food and the population growth rate is amongst the highest in the world, enhancing yield per unit area of cereal crops will be critical to meet this demand. Seed number per unit area and seed size are critical components of seed yield. Although seed number tends to have a bigger influence on yield (Boyles et al., 2016), seed size can make a significant contribution and may offer prospects for further yield improvement (Yang et al., 2009). In addition, it is often a major quality attribute (Lee et al., 2002). Hence, elucidating the genetic basis of seed size and the impact of domestication on seed size genes in sorghum will enhance the understanding of crop domestication and provide new targets for manipulating seed size in breeding practice.
Seed size is an important fitness trait for flowering plants and plays an important role in adaptation to particular environments. Under natural conditions, greater seed resources stored in larger seeds enable seedlings to grow more rapidly at the seedling stage and increases competitiveness and survival (Manga and Yadav, 1995). However, increased seed number also translates directly into fitness, resulting in selection pressure to produce more (and thus smaller) seeds (Westoby et al., 1992). For cereal crops, the preference of early farmers for large seeded lines for easier harvesting, processing, and planting has resulted in larger seed size being selected during domestication. This selection process has left observable genetic changes, including a reduction of genetic diversity and an increased frequency of favorable seed size alleles in cultivated lines compared to their wild progenitors (Doebley et al., 2006). For example, in rice, the favorable allele of GS3, which encodes a heterotrimeric Gprotein subunit that affects seed weight and length, was highly enriched in a set of cultivated accessions of rice (Oryza sativa L.) (34%) compared to a set of wild accessions (4%; Takano-Kai et al., 2009;Botella, 2012). In maize (Zea mays L.), Bt2, which encodes the small subunit of the ADP-glucose pyrophosphorylase involved in starch biosynthesis and seed weight, has shown a 3.9-fold reduction in genetic diversity in cultivated inbred lines compared to their wild teosinte relatives (Whitt et al., 2002). Likewise, selection signatures have also been identified on other seed size genes, including PBF1 (Lang et al., 2014), GS5 (Li et al., 2011), and GIF1 (Wang et al., 2008). These selection signatures provide a "bottom-up" approach to investigate the genetic basis of domesticated traits, which has been successfully implemented in many species for other traits such as prolificacy (Beissinger et al., 2014) and northern leaf blight resistance (Wisser et al., 2008) in maize.
Seed size is a physiologically complex trait. Sorghum seeds are typically tending toward spherical, although considerable Abbreviations: BBH: bidirectional best hit; RoD: reduction of diversity. phenotypic variation in length, width and density does exist. The potential size of the seed is often associated with cell number, cell size and number of starch granules and is highly correlated with ovary volume at anthesis (Yang et al., 2009). However, measures associated with seed size have not been used consistently in the literature, where individual grain weight is often used as a surrogate for seed size. As key components of carbon demand (sink) during seed filling, seed size and weight are strongly associated with both carbon supply (source) and transport between carbon sources and the seed (path). The potential mass of individual seeds is determined by the rate and duration of seed filling. In sorghum, seed filling rate is highly correlated with ovary volume at anthesis, which in turn is associated with the size of the meristematic dome during early floret development (Yang et al., 2009).
Although seeds with larger potential size tend to have greater seed mass, the extent to which this increased seed mass is actually achieved is strongly determined by assimilate availability for each seed. The amount of assimilate per seed is driven by factors affecting both seed number and assimilate supply. Total seed number per plant is determined by the number of seeds per panicle and the number of panicles per plant (i.e., tillering and branching), which are affected by a range of genetic and environmental factors (Alam et al., 2014). A negative correlation between seed size and seed number has been observed frequently in cereals (Jakobsson and Eriksson, 2000;Acreche and Slafer, 2006;Peltonen-Sainio et al., 2007;Sadras, 2007). Specifically in sorghum this trade-off has been observed by different groups (Heinrich et al., 1983;Yang et al., 2010;Burow et al., 2014). Traits such as number of seeds per panicle and number of tillers per plant are also commonly negatively correlated with seed size (Moles and Westoby, 2004). Contributors of assimilate availability for seed filling, including photosynthesis (Jagadish et al., 2015), have shown positive correlations with seed size. Environmental factors can also exert a strong influence on seed size by affecting assimilate supply (Jenner, 1994;Borrell et al., 2014) and carbon translocation (Zolkevich et al., 1958).
In accordance with this physiological complexity, seed size has been identified as a quantitative trait controlled by multiple genes, many of which have been cloned in model species (Xing and Zhang, 2010;Li et al., 2013;Zuo and Li, 2014). In Arabidopsis, a kinase cascade consisting of HAIKU1, HAIKU2, and MINISEED3 promotes seed development zygotically (Luo et al., 2005;Wang et al., 2010), while TTG2 (Garcia et al., 2005), AP2 (Ohto et al., 2009), and ARF2 (Okushima et al., 2005) are engaged in the maternal control of seed size. In rice, QTLs including GS3 (Mao et al., 2010), GS5 (Li et al., 2011), GW2 (Song et al., 2007), GW5 (Liu et al., 2017), GW8 (Wang S. et al., 2012), and GL7  were reported to regulate seed size by controlling cell division, while the influence of SRS3 (Kitagawa et al., 2010), D61 (Morinaka et al., 2006), and SRS5 (Segami et al., 2012) on seed size is related to the regulation of cell size. Additionally, the role of GIF1 in carbon partitioning during early seed-filling, which can impact seed weight, has been identified using functional analysis in rice (Wang et al., 2008). In maize, the Gln-4 gene (Martin et al., 2006) affects seed weight by controlling nitrogen transport to the kernel during seedfilling, whereas Sh2, which encodes the large subunit of ADPglucose pyrophosphorylase, affects seed weight by regulating starch biosynthesis (Jiang L. et al., 2013). Pleiotropy is common amongst genes affecting seed size. For example, D2 (Hong et al., 2003) and SMG1 (Duan et al., 2014) also have an effect on plant architecture, TH1  affects seed number, and TGW6 (Ishimaru et al., 2013) influences translocation efficiency from source organs. These genes may thus affect seed size via source-sink dynamics.
Sorghum, second only to maize among C 4 cereals in terms of the scale of grain production, is known for its adaptation to heat and drought stress, and is a staple for 500 million of the world's poorest people. Despite the great importance of this crop, the genetic basis of seed size in sorghum has been the subject of relatively few studies and little information is available about genetic control of the trait and signatures of domestication. Hence, this study aims to investigate the polymorphism patterns and signatures of domestication of candidate genes associated with seed size and weight by using resequencing data for a diverse group of wild and weedy and landrace genotypes (Mace et al., 2013) in order to enhance understanding of crop domestication and to provide potential targets for manipulating seed size in sorghum breeding.

Data Collection
Genes associated with seed size and weight (hereafter referred as seed size) in three species, maize, rice and Arabidopsis, were identified through a comprehensive literature review (Table S1). Seed length, seed width, and seed density are all potentially associated with seed size; therefore multiple parameters including thousand seed weight, seed length, and seed width, were used as keywords for literature searches. A subset of high confidence genes were identified with evidence of their association with seed size supported by QTL cloning, transgenic experiments, mutant analysis, association signal, and/or near isogenic lines analysis.

Identification of Orthologos Genes
Orthologous genes in sorghum were identified by combining synteny-based and the Bidirectional Best Hit (BBH) approaches (Wolf and Koonin, 2012). Genomic syntenic relationships between sorghum and model species were extracted from Plant Genome Duplication Database (http://chibba.agtec.uga. edu/duplication/) and used to search for syntenic orthologs, while a local BLAST strategy was used for the BBH approach to identify pairs of genes in two genomes that are the best BLAST hits (highest score) to one another, using BLASTP.

Expression Analysis of Seed Size Candidate Genes
The whole genome expression data from the study by Davidson et al. (2012) was used to investigate the differential expression of the 114 candidate genes. The data set compared expression of genes in the seed at two different time points and two different seed tissues in addition to five non-seed tissues (Davidson et al., 2012). The maximum expression value (Fragments Per Kilobase of transcript per Million mapped reads, FPKM) from any of the seed tissue samples was compared to the maximum expression value in any of the non-seed tissues and a fold difference >2 was used to define genes that were differentially highly expressed in the seed.

Gene Level Population Genetics Parameters
The sequence data of the seed size genes in sorghum were extracted from the whole genome resequencing data as described in Mace et al. (2013) for 25 sorghum genotypes, representing two groups: (1) wild and weedy genotypes and (2) landraces. A number of summary statistics based on gene level, including the average pairwise genetic diversity within a group, θπ (Nei and Li, 1979) and Tajima's D (Tajima, 1989), were calculated using a BioPerl module and an in-house perl script. F ST (Hudson et al., 1992) was calculated to measure population differentiation using another BioPerl module. Reduction of diversity (RoD) during domestication was calculated as fold of decrease of θπ in the landrace group compared to the wild and weedy group.

Identifying Selection Signatures at the SNP Level
CDS of the seed size genes across 25 resequenced genotypes was used to generate population statistics for every SNP using the R package PopGenome (Pfeifer et al., 2014). Specifically, a 1-bp window size with a 1-bp step size was used to define the slide window. θπ (Nei and Li, 1979), Fst (Hudson et al., 1992), and Tajima's D (Tajima, 1989) for each SNP within the CDS were calculated using diversity.stats, F_ST.stats, and neutrality.stats commands. Functional information was estimated by get.codons. RoD in the pairwise ancestor/descendant population comparison was calculated as fold of decrease of θπ in landrace compared to wild and weedy. To identify SNPs under purifying selection the following criteria were used: (1) RoD in the pairwise ancestor/descendant population comparison should be greater than the average RoD based on 159 neutral loci; (2) F ST should be positive; (3) Tajima's D should be negative.

mlHKA Test
A set of 63 seed size candidate genes under purifying selection were used as input, together with three random selections of 36 genes from 159 neutral genes, for the mlHKA (Wright and Charlesworth, 2004) test for validation purposes. The mlHKA program was run under a neutral model, where numselectedloci = 0, and then under a selection model, where numselectedloci >0. The number of cycles of the Markov chain was set to be 100,000. For each random selection of 36 neutral genes, three random numbers of seed were set to be 10, 20, and 30, respectively. This means 3 × 3 = 9 times of run were performed.
Significance was assessed by the mean log likelihood ratio test statistic, where twice the difference in log likelihood between the models is approximately chi-squared distributed with df equal to the difference in the number of parameters.

Haplotype Analysis of Genes under Selection
Haplotype analysis was performed using R package pegas (Population and Evolutionary Genetics Analysis System; Paradis, 2010) and ape package (Paradis et al., 2004) for genes under selection. Functions haplotype, haploFreq and haploNet were called to generate haplotype maps. In addition to landrace and wild & weedy, accessions from improved lines, Guinea margaritiferum race and S. propinquum were used in haplotype analyses (Table S2).

Seed Size Candidate Genes in Sorghum
Based on a comprehensive literature survey, 129 genes associated with seed size were identified in three well-studied model species, including 65 genes in rice, 21 in maize and 43 in Arabidopsis (Table S1). By using BBH method and the known syntenic relationship from the Plant Genome Duplication Database to infer orthologs (assembly v3.0), a total of 111 genes were identified in sorghum ( Table 1). From the set of 65 seed sizerelated genes identified in rice, 55 orthologs were identified in sorghum using the BBH method and 47 using the syntenic relationship method. Of these, 30 orthologs were identified by both methods, resulting in a total of 72 unique orthologs identified in sorghum (Figure 1). Additionally, a total of 23 orthologs were identified in sorghum based on the 21 seed size-related genes from maize, including 20 BBH orthologs and 12 syntenic orthologs with 9 orthologs identified by both methods. Finally, 25 sorghum orthologs were identified based on the analysis of the 43 selected seed size-related genes from Arabidopsis (Figure 1). Amongst all putative sorghum orthologs, 9 were in common across a minimum of two species, leading to 111 unique orthologs in sorghum identified as seed size candidate genes (Figure 1). Four seed size candidate genes in sorghum from Zhang et al. (2015) with one overlapped with the 111 seed size orthologs were also taken into consideration, resulting in a final list of 114 seed size candidate genes.
The 114 identified seed size candidate genes were unevenly distributed across the 10 sorghum chromosomes, ranging from 23 genes located on chromosome 1 to only 2 genes located on chromosome 5. Whole genome expression data from the study by Davidson et al. (2012) was used to investigate the differential expression of the 114 candidate genes. A total of 22 genes exhibited differentially high levels of expression in the seed (Table  S3).

Genetic Diversity in Seed Size Genes in Sorghum
Sequence data for all 114 candidate genes was extracted from a previously described set of wild and weedy genotypes and landraces (Table S2; Mace et al., 2013). Overall, the selected genes exhibited a wide range of variation in sequence diversity FIGURE 1 | One hundred eleven orthologs of seed size genes identified in sorghum. Both the BBH method and the known syntenic relationships were used to identify orthologs of previously identified seed size genes in Arabidopsis (43), maize (21), and rice (65). The black arrows indicate BBH-identified orthologs, while the red arrows indicate syntenic orthologs.
in both genotype groups (the wild and weedy genotype group and the landraces group), with diversity measures (θπ) varying from 0.0085 (Sobic.002G311000) to 0 (Sobic.003G380900) in the wild and weedy genotypes, and from 0.0070 (Sobic.004G317300) to 0 (Sobic.003G035400, Sobic.003G380900, Sobic.004G065400, and Sobic.006G059900) in the landraces (Table S4). The SERF1 (a negative regulator of seed filling in rice) ortholog, Sobic.003G380900, was invariant in all the genotypes included in the current study. The sequence diversity observed in the seed size candidate genes in the wild and weedy genotypes was not significantly different to the genome-wide averages. However, the seed size candidate genes in the landraces were significantly less diverse than the genome-wide averages (p = 0.026, t-test) (Figure 2A) and were significantly less diverse in comparison to the wild and weedy genotypes (p = 3.68E-11, paired t-test). The RoD in the seed size candidates between the two genotype groups during domestication was greater when compared to 159 neutral genes identified in a previous study (Mace et al., 2013; Table S5, Figure 2B). The degree of population differentiation, measured by the fixation index F ST, based on the seed size candidate genes was significantly higher between the landrace and wild and weedy genotypes ( Figure 2C) in contrast to the neutral genes.
Furthermore, the extent of RoD varied among the seed size candidate genes. Two genes, Sobic.006G059900 (ZmIPT2 ortholog) and Sobic.003G035400 (GW5 ortholog), were invariant in the landrace genotypes, despite having high levels of sequence diversity in the wild and weedy genotypes. The signature of significantly reduced sequence diversity in the landrace group, in comparison to the wild and weedy group, was also observed FIGURE 2 | Sequence variation identified in the seed size candidate genes in sorghum. (A) A comparison of sequence diversity (θπ) between the seed size candidate genes (red) and genome-wide averages (blue) in both the landrace and wild and weedy groups. Error bars indicate the standard error; * indicates a significant difference (p < 0.05, t-test) between the groups. (B), Box-plots showing the distributions of sequence diversity reduction fold) of 114 seed size candidate genes (red) and 159 neutral genes (blue) during domestication. The p-value was calculated based on a t-test. (C), Box-plots showing the distributions of F ST between the landrace and wild and weedy genotype groups for 114 seed size candidate genes (red) and 159 (blue) neutral genes. The p-value was calculated based on a t-test.
in four other genes, with RoD ranging from 15-to 58fold: Sobic.003G030600 (58-fold decrease), Sobic.003G277900 (25-fold decrease), Sobic.007G149200 (20-fold decrease), and Sobic.003G230500 (15-fold decrease). A contrasting signature of increased sequence diversity in the landraces was observed for 16 seed size candidate genes, including Sobic.004G237000, a syntenic ortholog of PGL2, with θπ of 0.0048 in the landrace genotypes in comparison to just 0.0021 in the wild and weedy genotypes. In addition to reduced sequence diversity in the landraces, a more skewed allele frequency, as determined through a negative Tajima's D value, was observed in the majority of cases.

Signatures of Selection in Seed Size Candidate Genes
Based on the genome-wide thresholds for the gene-level rankings described in Mace et al., (2013), 6 seed size candidate genes were identified with signatures of purifying selection during sorghum domestication (Table S6). Previous studies (Whitt et al., 2002;Brugiere et al., 2008;He et al., 2011;Hufford et al., 2012;Jiao et al., 2012;Xu et al., 2012;Luo et al., 2013;Weng et al., 2013;Wills et al., 2013;Lang et al., 2014;Zuo and Li, 2014;Sosso et al., 2015;Si et al., 2016) revealed purifying selection signals in 7 maize and 9 rice seed size genes included in this study (Table  S1). Twenty one orthologs were identified in sorghum from 15 of the 16 genes under selection in either maize or rice, however, only one of them, Sobic.006G059900 (ZmIPT2 ortholog), was identified with signatures of purifying selection in sorghum based on the gene-level rankings (Table S6).
To investigate the domestication signature in the 114 sorghum seed size candidate genes at a higher resolution, signatures of purifying selection at the SNP level were analyzed. In total, 2,317 SNPs were identified in the CDS of all 114 candidate genes, consisting of 1,202 synonymous SNPs and 1,115 non-synonymous SNPs. In addition to sequence diversity (θπ) metrics, F ST , Tajima's D, and RoD during domestication were calculated for each SNP. Based on the specified criteria regarding these metrics (see methods), 283 SNPs from 63 genes were identified with signatures of purifying selection, including Sobic.003G406600 (GW8 ortholog), Sobic.008G100400 (SMK1 ortholog), and Sobic.009G053600 (GS5 ortholog). Out of the 63 genes under selection, 42 contained non-synonymous SNPs under selection (Table S7). The selection signatures identified at the SNP level included 5 out of 6 genes under selection at the gene level.
To validate whether the 63 selection candidates displayed patterns of genetic variation consistent with purifying selection, the mlHKA test was employed. A model of directional selection best explained the patterns of polymorphism observed relative to Frontiers in Plant Science | www.frontiersin.org   159 neutral loci (mean log likelihood ratio test statistic = 661, P < 7.49E-94 for all comparisons, Table S8). Additionally, out of 22 seed size candidates exhibiting differentially high levels of expression in the seed, 17 (77%) were under selection. The percentage is significantly higher than the remaining 92 seed size genes not exhibiting differentially higher levels of expression in the seed, where only 46 genes (50%) in this group were found to be under selection (χ 2 = 6.546, p-value < 0.05), indicating seed size genes highly expressed in the seed are more likely to be targeted during domestication.

Parallel Domestication of Seed Size in Cereals
Seed size genes under selection across species were also identified. Among 15 seed size genes under selection in maize or rice, 12 were also found to be under selection in sorghum based on the SNP level CDS analysis in this study. A broader investigation of parallel domestication selection signals across syntenic orthologs of all the 114 seed size candidate genes in maize (Hufford et al., 2012;Jiao et al., 2012) and rice (He et al., 2011;Huang et al., 2012a;Xu et al., 2012) identified 30 seed size candidate genes in sorghum that have orthologs under selection in maize and/or rice (Table S6). Among these 30 sorghum genes, only one gene was under selection based on the gene level analysis, but 21 genes were identified as being under selection based on the SNP level CDS analysis (Table S6, Figure 3), with 4 of the 9 remaining genes having paralogs under purifying selection in sorghum. The sorghum seed size candidate genes under selection in multiple cereals included Sobic.009G070000 (GW5 ortholog), Sobic.003G406600 (the of GW8 ortholog), Sobic.007G101500 (Bt2 ortholog), Sobic.K041100 (GIF1 ortholog), and Sobic.005G001500 (PBF1 ortholog).

DISCUSSION
Seed size is a typical domestication syndrome trait, with cultivated cereal crops having larger seeds in comparison to their wild progenitors (Doebley et al., 2006). During domestication, large seeded genotypes were selected for their contribution to FIGURE 3 | Venn-diagram showing the number seed size genes under selection across species; sorghum (blue), maize (green), and rice (red). Seed size candidate genes under selection in sorghum were identified based on SNP analysis in sorghum, while selection signals on their orthologs in maize and rice were extracted from previous studies.
increased grain yield, but perhaps more importantly also for their positive effect on the quality of end-use products. Utilising the power of whole genome sequencing of diverse sorghum germplasm at the SNP level, combined with comparative genomic analysis of well researched cereal crops such as rice and maize, we identified 114 seed size candidate genes in sorghum.
Signatures of domestication were identified in over half (63) of these genes through SNP level analysis of the CDS regions, with a high degree of concordance of seed size candidate genes under selection across species observed. Additionally, a group of seed size candidate genes that exhibited differentially high levels of expression in the seed were found to be more likely under selection during domestication. These results provide new insights into the genetic control of seed size in sorghum and the domestication of the seed size trait in cereal crops.
Candidate genes included in this study provide a useful entry point into investigating the genetic factors controlling seed size. An understanding of genetic diversity and evolutionary pressures on these seed size candidate genes in sorghum provides potential targets for manipulating seed size via marker-assist selection or genome editing. In particular, intrinsic seed size genes may prove more amenable to relatively simple interventions in comparison to genes which effect seed size indirectly, for example via grain number.

Seed Size Candidate Genes under Selection Are More Likely to be Intrinsic Seed Size Genes Rather than Pleiotropic Seed Size Genes
Of the 111 orthologs identified in sorghum based on seed size genes from maize, rice, and Arabidopsis, only 9 orthologous genes were identified as being associated with seed size in more than one species (Figure 1). This limited overlap suggests that the sample of seed size genes identified to date in each species is incomplete and/or that the genetic factors influencing seed size vary among species. This is likely to be due to the complexity of the genetic control of seed size, which is controlled by factors involved in intrinsic seed size, such as cell number, cell size, structure and composition, and by physiological factors involved in the carbohydrate supply-demand balance and transport. Given the differences in plant architecture and physiology across the four species, it seems likely that genes under selection in sorghum that have also been identified as seed size genes in more than one species, either affect intrinsic seed size or directly affect seed number through an effect on panicle architecture, rather than affecting seed size via carbohydrate supply or indirectly affecting seed number. Both situations occurred in this study, as Sobic.001G341700, the ortholog of GS3 and ZmGS3 directly influences cell number in the seed, whereas Sobic.002G216600, the ortholog of DEP1 and AGG3, changes panicle branching and therefore seed number (Huang et al., 2009;Mao et al., 2010;Chakravorty et al., 2011;Li S. et al., 2012).
Of the 63 seed size candidate genes identified as being under selection in sorghum, 21 were identified as being under selection in at least one of the other species (Table S6). Genes that exhibited differentially high levels of expression in the seed are more likely to be associated with intrinsic variation for seed size. Our data shows that these genes were much more likely to be under selection during domestication. This provides support for the hypothesis that the modification to seed size during domestication preferentially targeted genes for intrinsic seed size rather than genes that indirectly impact on seed size.

Base Pair Level Analysis Provides a High Resolution Approach to Study Domestication Signatures on Seed Size Genes
Domestication has shaped sorghum into a productive crop from a wild grass. Previous studies in sorghum have identified thousands of genes underpinning sorghum domestication based on whole genome analyses (Mace et al., 2013;Morris et al., 2013). This study detected selection signals in 63 seed size candidate genes in sorghum identified from cross species analyses based on individual nucleotide level analyses. The nucleotide level analyses provide greater resolution to study domestication signatures than whole gene level rankings. In general, when genes are under strong purifying selection, the gene level analysis may provide sufficient power to identify the signature of selection. For example, in Sobic.009G049400, the ortholog of SRS3 conferring a round seed phenotype in rice (Kitagawa et al., 2010), 44% of the SNPs were identified with signatures of purifying selection ( Figure 4A). The majority of the remaining SNPs in this gene also exhibited the same trend of sequence diversity patterns, resulting in this gene being identified as under purifying selection at both the gene and nucleotide levels ( Figure 4C, Mace et al., 2013). However, during domestication, contrasting selections can be imposed on different mutant loci of the same gene (particularly genes with pleiotropic effects) at different times, which results in a gene with chimeric positive and purifying selection signals (Purugganan and Fuller, 2009;Campbell et al., 2016). This situation was observed in this study, where 11 SNPs in the SRS5 ortholog, Sobic.001G107100, clustering within 50 bp of each other, were identified with signatures of purifying selection ( Figure 4B). However, the gene was not identified as being under selection based on the gene level analysis due to the heterogeneous sequence diversity patterns observed across the entire gene length (Figure 4D). In such cases, analyzing each mutant locus separately provides increased resolution to identify the selection signature in comparison to gene level analysis in which contrasting selection signals within the same gene may cancel each other out.

Common Seed Size Genes under Selection across Cereals Supports Parallel Domestication of Seed Size in Grass Cereals
During crop domestication, human demands have led to a similar suite of traits being changed across a wide range of crops, a phenomenon known as convergent domestication (Lenser and Theißen, 2013). However, whether the same genetic basis underlies parallel changes in different species is still under debate. Early QTL mapping studies found close correspondence  Table S2. Colour-coding as follows; improved inbred lines (pink), landraces (red), wild and weedy genotypes (blue), S. propinquum (green), and guinea margaritiferums (purple). The size of the circles in the haplotype networks is proportionate to the number of accessions with that haplotype. The branch length represents the genetic distance between two haplotypes.
of QTLs for seed size, shattering, and flowering time across cereal crops (Paterson et al., 1995), with subsequent detailed QTL analyses identifying high levels of concordance in flowering time QTLs across sorghum and maize (Mace et al., 2013). Recently, Sh1, a major QTL controlling shattering, and HD1, a major locus conferring flowering time, have been reported to be under parallel selection in multiple cereals . In this study, among 15 seed size genes previously identified to be under selection in rice or maize, 12 were shown to have orthologs in sorghum under selection during domestication. Genes under parallel selection have been found to be major effect loci of seed size explaining a large proportion of the phenotypic variation (Lenser and Theißen, 2013). The significant overlap of selection signatures on seed size genes in cereals provides support for the role of parallel domestication.

CONCLUSIONS
Seed size and weight are physiologically complex traits controlled by many loci, some of which have been selected during the domestication of cereals. In this study, we have collated a large number of genes controlling seed size and weight across three extensively studied plant model species and identified their sorghum orthologs using comparative genomics analyses. We demonstrated that has domestication in sorghum left signatures of selection genetic signatures on multiple seed size candidate genes. For a number of the seed size genes we found signatures of selection that were common across sorghum, maize and rice, consistent with parallel domestication of the seed size trait. We also found that seed size candidate genes that exhibited differentially high levels of expression in the seed were more likely to be under selection during domestication. Our work sheds light on the processes involved in cereal domestication and provides potential targets for breeding to increase seed size and potentially yield.

AUTHOR CONTRIBUTIONS
DJ, EM, and IG conceived and designed the experiments: YT, AC, EM, DJ, and XZ collected data; YT, ST, BC, EV, JB, DJ, and EM analyzed data; YT and EM wrote the manuscript. EV, JB, IG, and DJ revised the manuscript. All authors read and approved the final manuscript.

ACKNOWLEDGMENTS
We acknowledge access to background IP from Grains Research and Development Corporation and support from the University of Queensland and the Department of Agriculture and Fisheries Queensland.