Genome-Wide Single-Nucleotide Polymorphisms in CMS and Restorer Lines Discovered by Genotyping Using Sequencing and Association with Marker-Combining Ability for 12 Yield-Related Traits in Oryza sativa L. subsp. Japonica

Heterosis or hybrid vigor is closely related with general combing ability (GCA) of parents and special combining ability (SCA) of combinations. The evaluation of GCA and SCA facilitate selection of parents and combinations in heterosis breeding. In order to improve combining ability (CA) by molecular marker assist selection, it is necessary to identify marker loci associated with the CA. To identify the single nucleotide polymorphisms (SNP) loci associated with CA in the parental genomes of japonica rice, genome-wide discovered SNP loci were tested for association with the CA of 18 parents for 12 yield-related traits. In this study, 81 hybrids were created and evaluated to calculate the CA of 18 parents. The parents were sequenced by genotyping by sequencing (GBS) method for identification of genome-wide SNPs. The analysis of GBS indicated that the successful mapping of 9.86 × 106 short reads in the Nipponbare reference genome consists of 39,001 SNPs in parental genomes at 11,085 chromosomal positions. The discovered SNPs were non-randomly distributed within and among the 12 chromosomes of rice. Overall, 20.4% (8026) of the discovered SNPs were coding types, and 8.6% (3344) and 9.9% (3951) of the SNPs revealed synonymous and non-synonymous changes, which provide valuable knowledge about the underlying performance of the parents. Furthermore, the associations between SNPs and CA indicated that 362 SNP loci were significantly related to the CA of 12 parental traits. The identified SNP loci of CA in our study were distributed genome wide and caused a positive or negative effect on the CA of traits. For the yield-related traits, such as grain thickness, days to heading, panicle length, grain length and 1000-grain weight, a maximum number of positive SNP loci of CA were found in CMS A171 and in the restorers LC64 and LR27. On an individual basis, some of associated loci that resided on chromosomes 2, 5, 7, 9, and 11 recorded maximum positive values for the CA of traits. From our results, we suggest that heterosis in japonica rice would be improved by pyramiding the favorable SNP loci of CA and eliminating the unfavorable loci from parental genomes.

Heterosis or hybrid vigor is closely related with general combing ability (GCA) of parents and special combining ability (SCA) of combinations. The evaluation of GCA and SCA facilitate selection of parents and combinations in heterosis breeding. In order to improve combining ability (CA) by molecular marker assist selection, it is necessary to identify marker loci associated with the CA. To identify the single nucleotide polymorphisms (SNP) loci associated with CA in the parental genomes of japonica rice, genome-wide discovered SNP loci were tested for association with the CA of 18 parents for 12 yield-related traits. In this study, 81 hybrids were created and evaluated to calculate the CA of 18 parents. The parents were sequenced by genotyping by sequencing (GBS) method for identification of genome-wide SNPs. The analysis of GBS indicated that the successful mapping of 9.86 × 10 6 short reads in the Nipponbare reference genome consists of 39,001 SNPs in parental genomes at 11,085 chromosomal positions. The discovered SNPs were non-randomly distributed within and among the 12 chromosomes of rice. Overall, 20.4% (8026) of the discovered SNPs were coding types, and 8.6% (3344) and 9.9% (3951) of the SNPs revealed synonymous and non-synonymous changes, which provide valuable knowledge about the underlying performance of the parents. Furthermore, the associations between SNPs and CA indicated that 362 SNP loci were significantly related to the CA of 12 parental traits. The identified SNP loci of CA in our study were distributed genome wide and caused a positive or negative effect on the CA of traits. For the yield-related traits, such as grain thickness, days to heading, panicle length, grain length and 1000-grain weight, a maximum number of positive SNP loci of CA were found in CMS A171 and in the restorers LC64 and LR27. On an individual basis, some

INTRODUCTION
Hybrid rice breeding has been successfully adopted in 27 ricegrowing countries. China is the largest producer and consumer of hybrid rice, and considers itself a pioneer in hybrid rice breeding. With the cultivation of hybrid rice, China obtained 20% more grain yield compared to their best conventional cultivars (Yuan and Virmani, 1988;Cheng et al., 2004). In China, a total of 33 million hectares (ha) of paddy fields are available for rice cultivation, in which 25 million ha are occupied by the indica rice and 8 million ha are occupied by the japonica rice. In the total land of indica rice, 70% are used for indica hybrids rice cultivation, whereas only 5% of the total planted land is used for japonica hybrid cultivations. (Tang et al., 2008). In the past, great achievements were made to increase the growing area and yield productivity of japonica hybrids by improving culturing practices with the use of productive seeds. However, until recently, the yield potential of japonica hybrids has been low and unstable compared to the indica hybrids. Previous breeding practices have confirmed that japonica hybrids encounter several challenges due to their low standard heterosis, which may be caused by their narrow genetic background, their large panicle size with insufficient filling (poor grain plumpness), or the unavailability of elite parental combinations for developing superior hybrids (Xie and Hardy, 2009).
Commercially developed hybrid cultivars exhibit desirable characteristics of yield and quality, that are inherited from their parental lines (Cao and Zhan, 2014). Such parental characteristics are heritable and could be appear in the F 1 generation in the form of heterosis. The successful selection of proper parents, based on the combining ability of yield-related traits, contributes to yield outcomes in hybrid breeding. The combining ability is a powerful breeding test that estimates the breeding values of parents and crosses in terms of general and specific combinations (Sprague and Tatum, 1942). Conventionally, the combining ability of crossing parents is calculated by evaluating all their developed crosses, which is laborious, tedious and time-consuming (Smith et al., 2008). Moreover, when the number of parents involved in combining ability manipulation become large, their hybrids affect the experimental feasibility (Bertan et al., 2007).
With recent rapid developments in molecular marker technology, it is now feasible to genotype or identify marker loci for CA. Previous studies made this possible and discovered a genetic basis of CA with SSR markers (simple sequence repeats) (Liang et al., 2010;Huang et al., 2013;Liu E. B. et al., 2013;Qi et al., 2013;Xie et al., 2016). These studies have revealed several SSR marker loci associated with the CA of yield and quality traits. However, such studies have been restricted to SSR markers. To date, no SNP base analysis has been reported for the discovery of SNP locus/loci associated with the CA of parental traits.
Recent advances in next-generation sequencing (NGS) have facilitated genome-wide SNP identification and SNP characterization. The advent of NGS has reduced the cost of genome sequencing to a level, where GBS is now considered a powerful tool for inquiring into a large number of genomic variations (SNPs). The GBS platform is simple, fast and accurate and has been widely practiced for large scale mining of SNPs in many crop species, including wheat , rice (Spindel et al., 2013), soybean (Sonah et al., 2013), chickpea (Kujur et al., 2015), sorghum (Morris et al., 2012) alfalfa (Yu L. X. et al., 2016), and olive (İpek et al., 2016).
SNPs are the most common and abundant sequence variants present in both plant and animal genomes (Kwok, 2001). They are present across the genome, within coding, non-coding and intergenic regions of the genome (Jiang, 2013). They gained tremendous importance as a third-generation molecular marker for a wide range of biological applications of marker-assisted and genomic selection, associations and QTL mapping, haplotype and pedigree analysis and cultivar identification (Lee et al., 2008;McCouch et al., 2010;Subbaiyan et al., 2012).
To this end, our study will appear as a first and new approach for the discovery of selective SNP loci of CA. Prior studies have revealed that CA is a complex quantitative trait (Liu C. et al., 2015). Therefore, the identification of marker loci within the parental genomes of japonica hybrid rice can ultimately be utilized to understand the biological and genetic factors of CA. Here, we aim to develop a reliable strategy to select and determine the superior parents of hybrid rice based on the presence of CA loci, which can be achieved by using SNP data with combinations of SNP and phenotype data. The main objectives of this study were as follows: (1) to dissect japonica parents for genomewide SNPs; (2) to evaluate japonica parents for GCA effect; and (3) to associate the identified SNPs of CMS and restorer lines with the CA of parents to determine genome-wide favorable or unfavorable SNP loci of CA related to yield-related traits.

Library Construction and Sequencing
The total DNA of 18 japonica rice was rice extracted from fresh leaves (10-20 mg) selected at the seedling stage using the Qiagen DNeasy Plant Mini Kit. The experimental procedure for DNA isolation was the same as that previously described (Stein et al., 2001). Following the recommended protocol of GBS, optimized by Poland et al., a restriction digest buffer NEB Buffer 4 and a core set of restriction enzymes, PstI (CTGCAG) and MspI (CCGG), were added to each sample to obtain a uniform distribution of cut sites across the rice genome . The DNA library fragment size and library concentration were analyzed using a bioanalyzer (Agilent) machine (Parson et al., 2013). To inactivate the enzymes, the samples were incubated twice at 37 • C for 2 h and at 65 • C for 20 min. A set of already developed adapters with 500 ng of high-quality DNA was attached to the sample ends preceding the ligation reaction. The ligated samples were pooled and selected on a 1.5% agarose gel (200-300 bp) followed by PCR-amplification in a single tube. The resulting fragments in the DNA library were sequenced using the Ion Torrent (PGM) sequencing machine with Ion Torrent kits (PGM 200 Kit v2) following the manufacturer specifications (Life Technologies, Carlsbad, CA, and U.S.A.). The sequencing data obtained in our experiment have been deposited into the NCBI short read archive (SRA) under the study accession number:SRR2758809.

Short Read Mapping, SNP Discovery and Annotation
The FASTQ format sequence reads obtained from the sequencing machine were treated for quality control. For this, the adopter sequences and low quality reads were analyzed and trimmed. The cleaned sequences of all 18 samples were aligned onto the Nipponbare genome (IRGSP-1.0) (http://rapdb.dna.affrc.go.jp/) with bowtie 2 software (Langmead, 2010). The -M4 -verysensitive local parameter for mapping was adopted (-D 20 -R 3 -N 0 -L 20 -i S,1,0.50), where if the alignment is greater than 4, it will output only 1 alignment at a random level. The reads that mapped successfully once onto a reference genome were considered to be uniquely mapped reads, whereas the reads mapped onto multiple locations were considered to be multiplemapped reads. The short reads that could not be mapped onto any part of the reference were called unmapped reads. The GBS data were further used for SNP discovery using the Tassel-GBS application with default parameters (Glaubitz et al., 2014). The SNPs that evolved repeatedly were deleted from the final results.
The physical locations of the discovered SNPs were annotated by SNPEff software (Cingolani et al., 2012). The SNPs that were situated within exon, intron and other corresponding regions were classified in detail. We also grouped the identified SNPs into different classes of polymorphisms, such as transition/transversions mutations and gene-base synonymous and non-synonymous nucleotide base mutations.

Field Experiment and GCA Calculation
In our experiment, all the plant materials belonged to Japonica rice, comprising nine CMS and nine restorer lines. These parental lines are widely used in China for the commercial breeding of three-line hybrid rice. For hybrid development, all the CMS and restorer lines were planted and manually crossed in a set of 9 × 9 combinations following the NCII mating design (North Carolina mating design II) (Shukla and Pandey, 2008). At the end of ricecropping season, a population of 81 hybrids was obtained. The resultant hybrid seeds were grown and evaluated the next year at the Jiangpu research station, Nanjing Agricultural University, Nanjing China, with randomized complete block designs in four replications. The seeds of hybrid and their parents (restorer lines, and maintainer lines instead of CMS lines) were sown in the rice nursery in May 10, 2015. After 30 days, the seedlings of each hybrid combination were transplanted to paddy fields. Each plot in the paddy contained 4 rows with 8 plants per row. Twenty centimeter row-to-row and plant-to-plant spacing for all hybrids and their parents was maintained. Recommended field management and cultural practices were applied accordingly to the local environment. At maturity, six plants from each plot were randomly selected to calculate the phenotypic data of 12 yieldrelated traits. The phenotypic data of the traits were tested for significance using the statistical method of analysis of variance, along with the coefficient of variation (CV %) as described by Gomez and Gomez (1984). The GCA effect values of each CMS and restorer line for the traits were calculated by using the phenotypic data of the developed hybrids, as per the Griffing model (1956), following method-2 (Mo, 1982). The significance of the parental GCA value was tested with an LSD test at a P < 0.01.

Association Analysis
To identify the SNP locus/loci associated with the CA of the yield-related traits, an association analysis between the identified SNPs and CA of 12 parental traits was performed using a computational software called CA screen 1.0 operated in the MATLAB language and developed by our laboratory * (Liang et al., 2010). The script of our association method follows the principle of single-marker analysis (SMA). This method of association led to the statistically significant identification of SNPs and their effect on CA in homozygous and heterozygous associations. Moreover, we explain the principle of this association model corresponding to the research article provided, where we developed 81 F 1 combinations by crossing nine CMS with nine restorer lines. Now, at a given locus of SNP, if the parental lines of 41 F 1 hybrids possess the heterozygous SNP genotype (for example A-G, A-T), and the parental lines of the remaining 40 F 1 hybrids possess the homozygous SNP genotype (A-A or G-G). Now, after association, if the average trait value of the 41 heterozygous crosses is significantly greater or less than the average trait value of the 40 homozygous crosses, then the SNP marker associated with the CA of the trait is significantly positive or negative. If the difference in the trait value in the heterozygous association is positive and we observed a positive effect on the CA of the trait, we consider this SNP locus to be a favorable associated marker genotype of the elite CA for the trait, and vice versa.
In our study, to avoid spurious associations, the significance of the associations was assessed by using a t-test at P < 0.01 (Pradeep et al., 2007). Furthermore, coefficient of determination (R 2 ) was calculated to determine the percentage of phenotypic variation explained by each SNP marker.

Selection of Functional SNPs
To determine the substantial number of functional SNPs, we selected associated SNPs based on their genomic positions. Only the SNP loci situated within the coding region of the genome (synonymous, non-synonymous and splice variants) were considered. Furthermore, in the NCBI, the database GENE was used to search the corresponding genes with associated SNPs for their nomenclature, chromosomal location, gene product, protein domain, and expected functions.

Short Read Mapping, SNP Discovery, and Distribution
Using a GBS assay, approximately 16 million (16.3 × 10 6 ) short reads were constituted, which comprised 4.7 GB of data. After removal of low quality reads, the sequenced data were mapped onto the Nipponbare reference genome. The mapping result indicated that 60.4% (9.86 × 10 6 ) of the short reads were uniquely mapped onto the 12 chromosomes of the Nipponbare genome. Approximately 17% (2.78 × 10 6 ) of the short reads were mapped more than once (Figure 1). The remaining 22.4% (3.67 × 10 6 ) of the short reads could not be mapped onto any part of the reference genome.
Within the mapped sequenced reads of the nine CMS and nine restorer lines, a total of 39,001 SNPs were discovered at 11,085 sites, including 13,186 and 25,815 in the CMS and restorer lines, respectively. Despite having the same coverage in the genomes, the SNPs identified in the restorer lines were double those in the CMS lines. Of the nine CMS lines, 13,186 SNPs were identified, where the number of SNPs within the CMS lines ranged from 365 in Aizhixiang A to 3412 in Chunjiang 18A (Supplementary Table 1). The discovered SNPs in the CMS lines were distributed non-randomly on the 12 chromosomes, where chromosome 11 occupied a maximum (2740) and chromosome 5 occupied a minimum (381) of SNP numbers (Figure 2). The density distributions of the discovered SNPs in the CMS and restorer lines were explored by calculating the SNP frequency within a 200 kb genomic region. The average SNP densities between the CMS lines per 200 kb window were 7.5. Likewise, the density distribution varied across the chromosomes, where chromosome 9 possessed the highest SNP densities of 18.3, while chromosome 5 possessed the lowest SNP densities of 3.2. This phenomenon was also found to be uneven within the individual chromosome. For example, chromosome 11 in the CMS lines was found to be dense (750 SNPs) from the 25 to 28.8 MB genomic region, but scant SNPs (48 SNPs) were observed from the 11 to 16 MB genomic region. Similarly, on chromosome 6, the 10 MB region from 1.8 to 11.8 contained 542 SNPs, but on the same chromosome, the region from 16.6 to 22.4 MB had only 53 SNPs. On chromosome 9, we observed 34 high-density SNP sites with >30 SNPs per 200 kb region (Figure 3).
The discovered SNPs in nine restorer lines were counted as 25,815. These SNPs ranged from 1508 in R4179 to 5397 in LR5 (Supplementary Table 2). Due to the different chromosomal lengths, the numbers of SNPs across individual chromosomes varied. For example, for all the restorer lines, chromosome 11 was found to be rich in SNP number with 4845 SNPs, whereas a low SNP number (464) was found on chromosome 4 (Figure 4). The average densities of the SNPs among the nine restorer lines were 15 SNPs per 200 kb genomic region. Such densities across chromosomes were non-randomly distributed.
FIGURE 1 | Classification of total short reads mapped onto the Nipponbare genome. The total number of reads (16.3 × 10 6 ) obtained through the GBS of 18 japonica rice is in the center of the circle. The number of unmapped reads (3.67 × 10 6 ) is shown in blue. The 9.86 × 10 6 short reads, in brown, are mapped uniquely onto the reference genome. The green circle represents the multiple (2.78 × 10 6 ) mapping of reads on chromosomes.
The highest (35.6) SNP density was observed on chromosome 11, whereas the lowest (2.7) was observed on chromosome 4. Such densities were also found to be uneven within the individual chromosomes. In chromosome 2, the region between 5 and 22.2 MB had an unusually high (2796 SNPs) SNP density, whereas the region between 26 and 32.4 MB had a very low (24 SNPs) SNP density. The same results were observed on chromosome 11, where a high (3508 SNPs) and long SNP interval was detected between the 16 and 28.4 MB genomic region ( Figure 5). A small (120 SNPs) and short interval was also found from the 0.4 to 4 MB region of the same chromosome. On chromosome 11 of the restorer lines, we identified a maximum of 57 high-density SNP sites, with >30 SNPs per 200 kb region.

Annotation of SNPs
The annotation of the Nipponbare rice was used to locate the distribution pattern of the identified SNPs within various genomic regions. Overall, a similar SNP distribution was observed in the CMS and restorer lines. Of the 13,186 SNPs discovered in the nine CMS lines, 7582 (57.5%) were found within intergenic spaces. The other 3212 (24.3%) and 2649 (20%) were in the intron and exon regions, respectively. Among the total SNPs in the coding regions, synonymous were counted as 1155 (8.7%), while non-synonymous were further divided into missense, 1214 (9.2%), and nonsense, 27 (0.2%) ( Table 1,  Supplementary Table 3). Moreover, the detected SNPs of the nine CMS lines were also categorized as transition (C/T and  Table 5). In our results, the average ratio between the transitions and transversions was 1.4. The number of transition substitutions was significantly higher than that of transversion substitutions in the identified SNPs of all nine CMS lines (Figure 6).

G/A) and transversion (G/T, T/A, A/C, C/G) nucleotide bases (Supplementary
Subsequently, the identified SNPs of the restorer lines were also annotated, where 14,970 (58%) were situated in the intergenic region of genome. However, 6226 (24%) and 5377 (20%) were detected within the intron and exon regions ( Table 2,   Supplementary Table 4). Of the SNPs in the coding regions, 2189 (8.4%) led to synonymous while 2711 (10.5%) led to nonsynonymous amino acid changes with a protein-altering effect on genes. The average ratio of the transitions and transversions in the identified SNPs was also 1.4 (Supplementary Table 6). The similar result of higher transition substitutions was also observed in the identified SNPs of all nine restorer lines (Figure 7).

Performance of the GCA of Parents for 12 Yield-Related Traits
The GCA performance of the 18 parents of hybrid rice for 12 yield-related traits is summarized in Table 3. Among the nine CMS lines, genotype Zhe04 exhibited the highest GCA effect value for plant height, whereas A171 had the highest GCA effect for days to heading and seed width. In terms of the panicle length, the number of panicles per plants and the number of grains per panicle, CMS line 863A was a good general combiner due to the highly significant and positive GCA effect value. CMS 18A recorded the maximum GCA effect value for seed thickness and grain yield per plot, whereas 90167A revealed the highest GCA effect value in terms of seed length and 1000-seed weight. Parent AizhixiangA and Chunjiang18A had the maximum GCA effect in terms of number of panicles per plant and seed setting rate, respectively ( Table 3).
The GCA assessment of the restorer lines indicated that LR27 recorded the maximum GCA effect value for plant height, whereas Shenhui254 recorded the maximum GCA effect value for days to heading. The restorer C4115 possessed higher GCA effect values for panicle length, seed thickness and seed-setting rate. In terms of seed length, seed width and 1000-seed weight, restorer Yanhui R8 recorded the highest GCA effect value. The parent LC64 exhibited a greater value for the number of spikelets per panicle, whereas LR5 exhibited a greater GCA effect for the number of filled grains per panicle and grain yield per plot. The restorer R4179 revealed the maximum GCA for the number of panicles per plant (Table 3).

Genome Wide Association Analysis for the Identification SNP Loci Associated with CA
The main objective of our study was to identify SNP locus/loci of CA via association analysis; the discovered 39,001 SNPs at 11,085 genomic positions were integrated with the parental CA of 12 yield-related traits at P < 0.01. We revealed a total of 362 SNP locus/loci with the CA of parental traits that caused a positive or negative effect on F 1 trait performances. The overview and detailed information of the identified SNP loci of CA for the following traits are presented in Figure 8 and Supplementary  Table 7.

Plant Height
Fifty-three SNP loci situated on 9 different chromosomes (Chr1 Chr2, Chr3, Chr4, Chr6, Chr7, Chr8, Chr11, and Chr12) displayed significant associations with the CA of plant height ( Table 4, Supplementary Table 7). Of these associated loci, 18 exhibited a positive and 35 exhibited a negative effect on the CA of plant height. The positively associated loci increased the CA by 6.6%, whereas the negatively associated loci reduced the CA by 11%. Among all the parents, the restorer LC64 possessed a maximum of 35 negative CA loci of plant height in its sequence genome ( Table 5). On an individual basis, SNP loci on chromosomes 11 and 7 with A/G and T/A alleles caused the most prominent positive effect on the CA of the trait, 8.8 and 7.9%, respectively.

Grain Width
Forty-four SNP loci were distributed over 11 different chromosomes and showed significant associations with the CA of grain width ( Table 4, Supplementary Table 7). Of them, 40 showed an increment (5.8%), whereas 4 exhibited a reduction (3.5%). The CMS Chunjiang 18A had a maximum of 34 favorable CA loci in the genome for the CA of grain width ( Table 5). On an individual basis, SNP loci on chromosomes 6 and 5 with A/T and T/C alleles caused the maximum positive effect on the CA of the trait, 11 and 7.6%, respectively.

Grain Length
Twenty-seven SNP loci distributed across 10 different chromosomes revealed significant associations with the parental CA of grain length ( Table 4, Supplementary Table 7). Among the associated loci, 17 conferred a positive effect of 5%, whereas 10 conferred a negative effect of 8%. For the number of associated SNP loci within the parents, the CMS A171 had a maximum of 13 favorable CA loci for grain length ( Table 5). On an individual basis, SNP loci on chromosomes 7 and 11 with C/T and G/A alleles caused the maximum positive effect on the CA of the trait, 6 and 5.9%, respectively.

Grain Thickness
Ten SNP loci on 8 different chromosomes (Chr2, Chr3, Chr4, Chr5, Chr6, Chr7, Chr10, and Chr11) revealed significant relationships with the CA of grain thickness ( Table 4, Supplementary Table 7). The effect of these loci on the CA explained that, 9 loci caused a 7.2% increment, whereas one locus caused a 4.9% reduction in the CA of grain thickness. Among the sequenced genome of the parents, the restorer LC64 exhibited a maximum of 5 positive CA loci ( Table 5). On an individual basis, SNP loci on chromosomes 7 and 12 with C/G and A/G alleles caused the maximum positive effect on the CA, increasing its value by 16.9 and 7.4%, respectively.

Grain Yield
Five SNP loci detected on 4 different chromosomes (Chr2, Chr3, Chr9, and Chr11) were found to be significantly associated with the CA of grain yield ( Table 4, Supplementary Table 7). Among them, one locus had a positive effect of 26.3%, whereas 4 loci had a negative effect of 27.5%. The restorer line LC64 contained a higher number of negative CA loci for grain yield ( Table 5). Among all the associated SNPs, an SNP locus on chromosome 2 with a G/A allele had a favorable (26.3%) effect on the CA value.

Days to Heading
Twenty-seven SNP loci situated on 9 different chromosomes (Chr1, Chr2, Chr3, Chr5, Chr6, Chr7, Chr8, Chr11, and Chr12) revealed significant associations with the CA of days to heading ( Table 4, Supplementary Table 7). Among them, 22 had a positive effect of 7.8% on the CA, whereas 5 caused a negative effect of 7.6% on the CA of the trait. We found that the restorer LC64  had a maximum number of positive CA loci in days to heading ( Table 5). On an individual basis, SNP loci on chromosomes 5 and 1 with A/G and G/C alleles contributed the maximum effect to the CA in terms of days to heading, 10.2 and 9.7%, respectively.

Panicle Length
One hundred and ten SNP loci distributed over all 12 chromosomes revealed significant associations with the CA of panicle length. Among them, 77 exhibited a positive (16.2%) contribution, whereas 33 exhibited a negative reduction in the CA ( Table 4, Supplementary Table 7). For all the sequenced parents, the restorer LR27 had a maximum number of 71 favorable and 7 unfavorable CA loci in the panicle length ( Table 5). On an individual basis, SNP loci on chromosomes 9 and 3 with A/C and T/C alleles showed the maximum positive effect of 34 and 25%, respectively.

Panicle Number Per Plant
Five SNP loci located on 4 different chromosomes (Chr2, Chr5, Chr9, and Chr11) showed significant associations with the CA of panicle number per plant (  (Table 5). Among all the SNP loci, the variants on chromosomes 5 and 9 having G/A and A/T alleles caused the maximum improvement in the CA of 39 and 31%, respectively.

Number of Spikelet's Per Panicle
Fifty-three SNP loci distributed over 10 different chromosomes were found to be significantly associated with the CA of number of spikelet's per panicle ( Table 4, Supplementary Table 7). Of these, 2 loci had a positive (3.7%) and 51 had a negative  (4.2%) effect on the CA of the trait. The restorer LC64 had a maximum of 47 negative CA loci ( Table 5). Two associated loci on chromosomes 11 and 2 with A/G and C/T alleles exhibited the maximum positive effect on the CA of 19 and 15%, respectively.

Number of Filled Grains Per Panicle
Seven SNP loci were found to be associated with the CA of number of filled grains per panicle. These loci were distributed on 4 different chromosomes (Chr1, Chr2, Chr4, and Chr9) ( Table 4, Supplementary Table 7). Of these, one locus contributed positively (20%), whereas the remaining 6 loci negatively affected (26%) the CA of the trait. It is noteworthy that, CMS 18A had a higher (5) number of negative CA loci in its genome of CA (Table 5). An SNP locus on chromosome 2 with a C/T allele caused a favorable increment in the CA value of the trait.

Seed Setting Rate
On chromosome 10, only one SNP locus was found to be associated with the CA of seed-setting rate (

GBS Assay for Genome Wide SNP Discovery
The exploitation of heterosis in rice has increased the global ricegrain yield. This has led to an increase in productivity (growth, size, development, fertility and yield) of the progeny over that of both homozygous parents and is a fascinating phenomenon (Hochholdinger and Hoecker, 2007). The application of heterosis has been applied not only to rice but also to other crops such as tomatoes (Williams and Gilbert, 1960), maize (Shull, 1908), wheat (Singh et al., 2004), and soybeans (Pandini et al., 2002). In plants, heterosis is considered to be a complex trait that involves the presence of potential genetic distances, combining abilities and heterotic patterns of the parents traits (Beck et al., 1990). Rice breeders emphasized that heterosis in F 1 s is caused by the presence of combining ability alleles, that are present at different  loci of crossing parents (Liu et al., 2002). They described how the combination of different alleles at a specific locus of crossing parents creates heterozygosity that results in the heterosis of offspring.
One part of our present study describes genome-wide SNP discovery in japonica rice. A large number of rice cultivars have been sequenced for SNP discovery to perform genetic and genomic experiments. However, only a few studies have been conducted on japonica rice (Nagasaki et al., 2010). Rice breeders have concluded that SNP discovery in japonica rice is challenging compared to that in indica rice, due to its close genetic relatedness and lower diversity (Glaszmann, 1987;Yang et al., 1994;Gao et al., 2005;Negrao et al., 2008). In our recent efforts, 18 parents of japonica rice were sequenced for SNP discovery, following optimized genotyping with a sequencing method. The use of two restriction enzymes based on a sequencing method of GBS is the most convenient, cost-effective and easy approach to the SNP discovery of large genome crop plants, including rice (Furuta et al., 2016). Nevertheless, this method also has the drawback of generating a large amount of missing data (Davey et al., 2011;Kim et al., 2016).
Here, the successful mapping of 60.4% of short reads onto the reference genome revealed 39,001 SNPs at 11,085 positions. The unique mapping of short reads plays a crucial role in the determination of SNPs (Nielsen et al., 2011). However, unmapping reduced the chances of SNP discovery, which eventually occurs due to genomic deletions during the process of sequencing (Arai-Kichise et al., 2011). In our study, the number of SNPs among the parents varied. We observed higher polymorphisms in the nine restorer lines compared to the CMS lines. Of the 18 sequenced parents, the CMS Aizhixiang A reported a minimum number of SNPs. An in-depth analysis of the sequence genome of Aizhixiang A indicated that 70% of its sequence genome was similar to that of the Nipponbare reference. Furthermore, 25% of the missing data were also found, which further affected the chances for SNP discovery. We also found that the SNPs of CMS and the restorer lines were non-randomly distributed across 12 chromosomes and revealed several SNP-rich and SNP-poor intervals. Such extended SNP regions were similar to those reported in rice, wheat, mei (Prunus mume sieb. et zucc) and peach cultivars (Ravel et al., 2006;Subbaiyan et al., 2012;Fresnedo-Ramírez et al., 2013;Sun et al., 2013). In our study, there were also non-uniform patterns of SNPs within individual chromosomes, particularly at the terminal region of chromosome. We observed this situation at the terminal region of chromosome 11 for both the CMS and restorer lines. This may possibly arise due to the low variant rate at the centromere region of the chromosome, where recombination are typically more rare than at the chromosome terminal (Guo W. et al., 2007).

Annotation of SNPs
In the sequenced genome of animals and plants, SNPs do not present equally throughout the genome. Some parts of the genome revealed more SNPs than others. The discovered SNPs in our study were annotated according to their positions in the genome. The annotation of identified SNPs indicates that more than half (57%) of the SNPs are within the intergenic region, whereas 24.4% of the SNPs have been found within the intron region of the genome. Only 20.4% of the SNPs occurred inside the exon regions, with 8.6% synonymous and 9.9% non-synonymous in nature. The number of SNPs in the exon region was 37% less than that in the intergenic regions. This is possibly, because of the higher frequency of polymorphisms evolving in intergenic spaces rather than coding sequences . However,  the SNPs situated inside intergenic regions are also prominent, where there may have been the possibility of the presence of some functional alleles or genes (Salvi et al., 2007). In our study, the ratio of transitions to transversions was 1.4. This ratio was in line with former sequencing experiments in flax (Kumar et al., 2012).

Evaluation of Parents for GCA
The concept of combining ability (GCA and SCA) was developed by Sprague and Tatum in 1942. They characterized the breeding values of inbred lines by testing and evaluating them by combining ability evaluation trails (Sprague and Tatum, 1942). In every hybrid-breeding program, a breeder's goal is to select parents with good general combining ability and crosses that have high specific combining ability in their traits. From a genetic point of view, GCA evolved under additive and additive × additive gene action, whereas SCA evolved under non-additive gene action. The GCA of inbred parents is considered more important than the SCA, in which the parents were involved in random mating. The GCA evaluation of 18 parents of japonica hybrid rice indicated that most of the CMS lines exhibited maximum and positive GCA effect values for their studied traits. Previously, the same results of good general combining ability in the CMS lines were also reported (Singh et al., 1996). In contrast, most of the restorer lines also revealed greater GCA effect for their traits. The evidence of a higher GCA of restorer lines was also found in earlier studies (Rogbell et al., 1998).

SNP-CA Associations
With the discovery of genome-wide SNPs, several SNP-trait associations were reported in plant species, including maize, wheat, cotton, soybean, barley and Arabidopsis thaliana (Atwell et al., 2010;Yang et al., 2010;Pasam et al., 2012;Zhang et al., 2015;Su et al., 2016). In the past, most of the association studies on rice have focused on the improvement of yield-related traits Zhao et al., 2011). Very few SNP associations with CA of traits have been performed.
Here, we studied the heterozygous combinations of nucleotide bases (A-T/G-C) at a locus and observed its associations with the CA of traits. Moreover, we worked directly with the heterozygous association group and revealed 362 CA loci of 12 yield-related traits of japonica rice. For those results, each of the CMS and restorer lines contained a number of favorable or unfavorable SNP locus/loci of CA in the genome that caused a positive or negative effect on the traits. Few of the associated SNP loci were found to be co-associated with the CA of more than one yieldrelated trait. Among the CMS lines, 18A had a maximum number of negative SNP loci of CA for panicle and grain-related traits; whereas CMS A171 had a maximum number of positive SNP loci of CA for grain thickness and day to heading. For the restorer lines, LC64 and LR27 contained maximum numbers of positive SNP loci of CA for most of the studied traits. Previously, in maize, several favorable and unfavorable marker loci of CA were detected for five yield-contributing traits . Of these, the positive improved, while the negative decreased, the CA of the traits. Similarly, Qu et al. identified the CA loci of agronomical traits in rice by using QTL mapping of BCRILs (Qu et al., 2012). Moreover, all of the identified and associated SNP loci in our study were spread genome wide, with some on chromosomes 2, 5, 7, 9, and 11 exhibiting a maximum positive effect for the CA of traits. The correct identification of SNPs based on annotation results made them informative. Previous studies have explained that the SNPs within the coding region can alter the promoter activities of gene expression, transcription and translation capability (LeVan et al., 2001;Alonso-Blanco et al., 2005;Ng and Henikoff, 2006;Shastry, 2009;Batley, 2015). Such identification and characterization of gene-based SNPs could be directly utilized in any breeding program regarding crop improvement (Wang et al., 2008;Huang et al., 2010). Among the 362 SNP loci detected in this study, 32 SNP loci for CA of seven traits were situated within the reported genes (Supplementary Table 8). For CA of plant height, we found 7 SNP loci within previously identified genes. These genes encode proteins for photosynthesis, hormonal activity, lipid binding and transporter activity. Of the seven reported genes for plant height, Os03g0309200 encodes a proteins causing significant effect on photosynthesis activity, flowering time, and also have strong effects on stem and inter-node elongation in rice. Among four SNP loci for CA of grain width detected within the characterized genes, Os06g0225300 is directly involved in improving rice architecture for high yield. Overall, the biological function of most of the reported genes were involved in photosynthesis, flower and embryo development, post-embryonic development, multi-cellular organismal development, cellular processes and metabolic processes. Similarly, the molecular function of these genes revealed DNA binding, lipid binding, nucleotide binding, protein binding, DNA binding transcription factor activity, catalytic activity and kinase activity. Such SNP loci situated inside the genes could be used as a candidate marker locus for tagging genomic regions responsible for improving CA of yield traits.
In conclusion, we present the identification of 215 positive and 147 negative coding-based SNP loci of CA for heterosis breeding in japonica rice. Our study implies that the identified SNP loci of CA will increase the selection efficiency of rice breeders in the proper selection of inbred parents. These insights into rice will be productive for future work to improve the CA of parents for yield-related traits and develop superior japonica hybrids by utilizing breeding designs. We expect that, the parental lines with more positive SNP loci of CA and less negative SNP loci of CA will be useful in hybrid breeding, because they have much higher potential for CA improvement for the traits.

AUTHOR CONTRIBUTIONS
DH and IZ conceived the idea and designed the experiment; IZ, WT, EL, SK, HW, and EM contributed to the data collection; IZ analyzed the data and wrote the paper.