A chromosome-level genome assembly of an early matured aromatic Japonica rice variety Qigeng10 to accelerate rice breeding for high grain quality in Northeast China

Early-matured aromatic japonica rice from the Northeast is the most popular rice commodity in the Chinese market. The Qigeng10 (QG10) was one of the varieties with the largest planting area in this region in recent years. It was an early-matured japonica rice variety with a lot of superior traits such as semi-dwarf, lodging resistance, long grain, aromatic and good quality. Therefore, a high-quality assembly of Qigeng10 genome is critical and useful for japonica research and breeding. In this study, we produced a high-precision QG10 chromosome-level genome by using a combination of Nanopore and Hi-C platforms. Finally, we assembled the QG10 genome into 77 contigs with an N50 length of 11.80 Mb in 27 scaffolds with an N50 length of 30.55 Mb. The assembled genome size was 378.31Mb with 65 contigs and constituted approximately 99.59% of the 12 chromosomes. We identified a total of 1,080,819 SNPs and 682,392 InDels between QG10 and Nipponbare. We also annotated 57,599 genes by the Ab initio method, homology-based technique, and RNA-seq. Based on the assembled genome sequence, we detected the sequence variation in a total of 63 cloned genes involved in grain yield, grain size, disease tolerance, lodging resistance, fragrance, and many other important traits. Finally, we identified five elite alleles (qTGW2Nipponbare , qTGW3Nanyangzhan , GW5IR24 , GW6Suyunuo , and qGW8Basmati385 ) controlling long grain size, four elite alleles (COLD1Nipponbare , bZIP73Nipponbare , CTB4aKunmingxiaobaigu , and CTB2Kunmingxiaobaigu ) controlling cold tolerance, three non-functional alleles (DTH7Kitaake , Ghd7Hejiang19 , and Hd1Longgeng31 ) for early heading, two resistant alleles (PiaAkihikari and Pid4Digu ) for rice blast, a resistant allele STV11Kasalath for rice stripe virus, an NRT1.1BIR24 allele for higher nitrate absorption activity, an elite allele SCM3Chugoku117 for stronger culms, and the typical aromatic gene badh2-E2 for fragrance in QG10. These results not only help us to better elucidate the genetic mechanisms underlying excellent agronomic traits in QG10 but also have wide-ranging implications for genomics-assisted breeding in early-matured fragrant japonica rice.


Introduction
Rice (Oryza sativa L.) is a safe and staple food source for more than half of the world's population and serves as a model plant for cereal genetic studies (Gross and Zhao, 2014). Novo sequencing and genomic technologies have been widely applied in rice to promote the shift of breeding schemes from conventional field selection to genomic-assisted breeding (Gu et al., 2022). O. sativa subsp. japonica/Geng and subsp. Indica/Xian are the two major subspecies of cultivated rice (Zhang et al., 2016a;Nie et al., 2017). The japonica/Geng rice planting area is 9.87 million ha, accounting for approximately 32.9 percent of the total rice planting area in China (Tang and Chen, 2021). Recently, the early-matured japonica/Geng rice is becoming more and more important, and its growing area was more than 4 million ha in Northeast China (Cui et al., 2022). Two genome draft sequences of the cultivated rice subspecies japonica/Geng Nipponbare and Indica/Xian 93-11 were released in 2002 (Goff et al., 2002;Yu et al., 2002). In 2005, the International Rice Genome Sequencing Project (IRGSP) published the first completed version of the Nipponbare sequence (International-Rice-Genome-Sequencing-Project, 2005). Over the last two decades, several pan-genomes including 66 rice genomes (Zhao et al., 2018b), 33 rice genomes (Qin et al., 2021), 111 rice genomes (Zhang et al., 2022a),251 rice genomes (Shang et al., 2022), and 12 japonica rice genome (Wang et al., 2023) were built including IR64, R498, Zhenshan 97, Minghui 63, Taichung Native 1, LTH, Kitaake, IR8, N22, Huajingxian74, HR12, Basmati 334, Dom Sufid, Huazhan and Tianfeng at the chromosome level, and Shennong265, DJ123, WR04-6, Suijing18, Koshihikari, Basmati, Kongyu-131, and Guangluai-4 at scaffold level have been assembled and released with unprecedented speed (Mahesh et al., 2016;Zhang et al., 2016b;Du et al., 2017;Li et al., 2017;Nie et al., 2017;Stein et al., 2018;Zhao et al., 2018b;Jain et al., 2019;Choi et al., 2020;Tanaka et al., 2020;Panibe et al., 2021;Li et al., 2021a;Yang et al., 2022;Zhang et al., 2022b). These assembled genome sequences will be helpful in pinpointing new causal variants that underlie complex agronomic traits and identifying many of the genome-specific loci that were absent from the Nipponbare reference genome. However, most of these varieties are Indica/ Xian rice or landrace. Nevertheless, the genome of japonica/Geng differs significantly from that of indica/Xian (Nie et al., 2017). Since the release of the finished version genome of Nipponbare, only seven genomes at the scaffold level of early-matured japonica/Geng varieties in northern region of China including Shennong265, Liaogeng5, Yanfeng47, Suijing18, Longgeng31, Daohuaxiang2 (Wuyoudao4), and Kongyu-131 were released (Nie et al., 2017;Li et al., 2018;Zhao et al., 2018b;Wang et al., 2023). Only Daohuaxiang2 and Suijing18 were belong to early-mature aromatic type. The public availability of japonica/Geng genomes at the chromosome level, especially for the early-mature aromatic type, remains largely blank (Nie et al., 2017). Moreover, a few genomes are not enough to represent the whole genomic content of the japonica rice. The novo assembled genomes of an early-mature aromatic variety would be advantageous for functional genomics and genome research. For example, if there is structural variation in the particular variety and the reference genome in the candidate region, the guiding role of the reference genome would be limited. So, there is still a need for de novo genome assembly for various purposes especially in early-mature aromatic rice breeding research.
Fragrant and long grain are key grain quality traits that directly influence the global market price of rice (Hui et al., 2022). The basmati rice and jasmine rice are the two most popular fragrant indica rice in the world. However, consumers from East Asia, including North China, Japan, and Korea tend to prefer japonica rice (Lu et al., 2022). So, the aromatic long grain rice from Northeast China, represented by Wuyoudao4 (WYD4), is the most famous rice in the Chinese market. WYD4 had a superior quality, but also had a number of defects, most importantly, poor lodging resistance, lack of cold tolerance and blast resistance, and late maturity (Gao et al., 2012). In 2019, Qiqihar Branch of Heilongjiang Academy of Agricultural Sciences developed Qigeng10 (QG10) to solved these defects of WYD4. The plant area of QG10 was 0.4 million ha in the recent three years. QG10 has been a major variety of early matured aromatic long grain rice in Northeast China. The construction of a high-quality chromosome-level genome of QG10 is very important for improving the efficiency of rice genetic mechanism studies for desirable agronomic traits, such as eating quality, cold tolerance, lodging tolerance and early maturity, as well as accelerating the process of high-quality rice breeding in cold region of northeast China by design (Li et al., 2021a). Here, we produced a high-precision QG10 chromosomal genome by performing wholegenome sequencing in the Nanopore platform (Lin et al., 2021), followed by the Hi-C-assisted assembly mount technology (Van Berkum et al., 2010). Our results provided several functionally important candidate alleles for the grain length, cold tolerance, early heading, disease resistance, lodging resistance, and nitrate-use of rice breeding in cold region of northeast China.

Materials
The early-matured aromatic long-grain japonica rice variety QG10, which was developed by our own group, was licensed for release in 2019 and is now widely planted in Heilongjiang province in Northeast China. It was a semi-dwarf rice variety with a lot of superior traits such as long panicle, long grain, aromatic, and good quality ( Figures 1A-D). It was selected from the cross between two aromatic japonica/Geng rice Wuyoudao4 (WYD4) and Suigeng4 (SG4) ( Figure 1E). The seedlings of QG10 were grown on the agricultural farm of the Qiqihar Branch of Heilongjiang Academy of Agricultural Sciences. Field management practices were performed according to the most commonly followed agricultural practices of local farmers. The leaves, stems, roots, and panicles at heading stages from plants grown in the experimental station were collected in liquid nitrogen for isolating RNA. The young leaves of a single young plant were used to isolate genomic DNA.
Oxford Nanopare sequencing and genome assembly The high molecular weight genomic DNA of QG10 was extracted from the 15-day-old leaf tissues following a modified CTAB method. Whole genome sequencing was done following the standard instructions of the Ligation Sequencing Kit (Nanopore, Oxford shire, UK). The quantified DNA was randomly sheared, and fragments of ∼20 kb were enriched and purified. Then, a 20-kb library was constructed and sequenced on the Nanopore PromethION platform according to the manufacturer's protocols (Jiang et al., 2020).

Hi-C library construction and sequencing
A Hi-C fragment library with a 300-700 bp insert size was constructed from the genomic DNA of the QG10. Briefly, adapter sequences of raw reads were trimmed and low-quality PE reads were removed for clean data. The library was sequenced on the Illumina HiSeq4000 ™ platform by Biomarker Technologies (Beijing, China). The clean Hi-C reads were first truncated at the putative Hi-C junctions and then the resulting trimmed reads were aligned to the assembly results with the software package bwa aligner (Li and Durbin, 2009). Only uniquely alignable pair reads whose mapping quality of more than 20 retained for further analysis. Invalid read pairs, including dangling-end and self-cycle, re-ligation, and dumped products, were filtered by the software package HiC-Pro v2.8.1 (Servant et al., 2015). The 96.25% of unique mapped read pairs were valid interaction pairs and were used for the correction of scaffolds and clustered, ordered, and orientated The of plant type, panicle phenotype, seeds phenotype, grains phenotype and pedigree of QG10. scaffolds onto chromosomes by the software package LACHESIS (Burton et al., 2013).

Genome assembly and Hi-C scaffolding
Before chromosome assembly, we first performed a preassembly for error correction of scaffolds which required the splitting of scaffolds into segments of 50 kb on average. The Hi-C data were mapped to these segments using the software package BWA (version 0.7.10-r789) (Li and Durbin, 2009). The uniquely mapped data were retained to perform assembly by using the software package LACHESIS (Burton et al., 2013). Any two segments which showed an inconsistent connection with information from the raw scaffold were checked manually. These corrected scaffolds were then assembled with the software package LACHESIS. After this step, placement and orientation errors exhibiting obvious discrete chromatin interaction patterns were manually adjusted.

Gene prediction and genome annotation
The RNA of QG10 were isolated from the mixed tissues (leaves, culms, roots, and panicles) following the manufacturer's protocol (Wang et al., 2023). We then performed the sequencing on the Illumina HiSeq 2500 platform according to the manufacturer's instructions. The repetitive sequence of the genome based on the principle of structure prediction and de-novo prediction was constructed with the software package LTR_FINDER v1.05 (Xu and Wang, 2007) and the software package RepeatScout v1.0.5 (Price et al., 2005). The PASTE Classifier was used to classify the database (Hoede et al., 2014). Then it was merged with the database of Repbase as the final repeat sequence database (Jurka et al., 2005). And then the software package RepeatMasker v4.0.6 was used to predict the repeat sequence of the QG10 genome based on the constructed repetitive sequence database (Tarailo-Graovac and ). the software packages Genscan (Burge and Karlin, 1997), Augustus v2.4 (Stanke and Waack, 2003), GlimmerHMM v3.0.4 (Majoros et al., 2004), GeneID v1.4 (Alioto et al., 2018), and SNAP (Korf, 2004) were used for de-novo prediction. The software package GeMoMa v1.3.1 was used for prediction based on homologous species (Keilwagen et al., 2016;Keilwagen et al., 2018). The software packages Hisat v2.0.4 (Kim et al., 2015) and Stringtie v1.2.3 (Pertea et al., 2015) were used for assembly based on reference transcripts, and the software packages TransDecoder v2.0 and GeneMarkS-T v5.1  were used for gene prediction. The software package PASA v2.0.2 was used to predict Unigene sequences without reference based on transcriptome data (Campbell et al., 2006). Finally, the software package EVM v1.1.1 (Haas et al., 2008) was used to integrate the prediction results obtained by the above three methods. The predicted gene sequences were compared with NR, KOG, GO, KEGG, TrEMBL, and other functional databases by the software package BLAST v2.2.31 (-evalue 1e-5) (Altschul et al., 1990) to perform KEGG pathway, KOG function, GO function and other genes functional annotation analysis.

Identification of genomic sequence variation in important genes
The whole-genome assemblies sequences of QG10 were compared with the rice reference genome sequence (Oryza_sativa_MSU7 version) using the software package MUMmer v3 (Kurtz et al., 2004). According to the results from the software package MUMmer, the sequence variations and SVs were further re-called using the software package BLAST. The synteny/inversion comparison were analysis by using GenomeSyn_Win.v1 . At the site of each sequence variant, the genotypic information for QG10, Nipponbare, and the elite variety having important genes was called according to the results of the one-to-one alignments. The allelic information of sequence variants was detected based on gff files from the Oryza_sativa_MSU7 version. The software packages ClustalW v1.8.3 (Thompson et al., 1994) and BLAST v2.2.31 were used for re-detected the sequence variations and detailed haplotype analyses for the well-characterized genes in rice (Zhao et al., 2018b).

Nanopore sequencing and genome assembly
We sequenced QG10 genomic DNA to generate about 73.52Gb of Nanopore sequencing raw data. After data quality control, the clean data volume was 63.23Gb containing 3,279,893 reads with a total of 166.34-fold sequencing depth. The reads with 10 -20 kb and 20 -30 kb sequencing length were account for 51.56% (Table 1). The mean reads length of clean sequencing data was 19.28 kb with an N50 length of 25.64 kb. The clean Nanopore sequencing data was re-corrected with the software package Canu (Koren et al., 2017). Then the third-generation sequencing data was re-corrected with the software package Racon (Vaser et al., 2017) for three rounds. Then, the second-generation data were used for three rounds of correction by Pilon (Walker et al., 2014) software, and the stain was removed according to NT alignment. Finally, we obtained a 380.15 Mb genome sequence with the contig N50 was 12.24 Mb. The completeness estimated by Benchmarking Universal Single-Copy Orthologs (BUSCO) was 98.12%.

Genome assembly and Hi-C scaffolding
We conducted the assembly in a stepwise fashion following a previously reported approach (Jiang et al., 2020). The Hi-C sequencing raw data was filtered, and the splice sequences and low-quality reads were removed to obtain high-quality clean data. The mapped data was obtained by sequence alignment of clean data with the preliminarily assembled genome. Finally, effective Hi-C data were used for further assembly of the draft genome sequence. LACHESIS software was used for clustering, sorting, and orientating the preliminary assembled genome sequence, and finally, the genomic sequence at the chromosome level was obtained. Finally, we assembled the genome (QG10) into 77 contigs with an N50 length of 11.80 Mb in 27 scaffolds with an N50 length of 30.55 Mb. The assembled genome size was 378.31Mb and the 65 contigs constituted approximately 99.59% of the whole genome. Visualization of the Hi-C signals indicated that 12 square matrix areas in the Hi-C heat map displayed significant differences from the background. These scaffolds were anchored into chromosomes 1-12, respectively ( Figure 2A; Table 2).
According to the whole genome comparison, the genome of QG10 showed four sequence inversions with a length of about 0.5-2 Mb compared with the Nipponbare genome at the position of about 14-16 Mb on chromosome 4, 30-31 Mb on chromosome 5, 5.5-6 Mb on chromosome 8, and 5.5-6 Mb on chromosome 10 ( Figure 2B). We identified 1,080,819 SNPs and 682,392 InDels between QG10 and Nipponbare on 12 chromosomes ( Figures 2C, D).

Genome annotation and repeat analysis
We annotated the repeat regions in our QG10 assembly by Repeat Masker and detected 492,503 repetitive regions with 177.52 Mb repeat length that contained 242,211 Class I retrotransposons, 223,439 Class II DNA transposons, 833 Potential Host Gene, and 2,222 simple sequence repeats (Table 3). The repeat regions make up 46.7% of the QG10 assembly genome.
We predicted 39,465 genes by the Ab initio method, 56,999 genes by the Homology-based method, and 24,998 by RNA-seq. Finally, a total of 57,599 genes were integrated with the prediction results obtained by the above three methods by using the software package EVM v1.1.1 (Table 4). A total of 723 tRNA, 306 rRNA, 194 miRNA, and 5,392 pseudogenes were also predicted. 94.11% of the genes could be annotated into NR, GO, KOG, KEGG, and other databases (Table 5).

Sequence variants of the genes controlling grain length
We investigated the sequence variations in 19 cloned genes controlling grain size, including GW2 (Song et al., 2007), GS2 , qTGW2 (Ruan et al., 2020), BG1 , OsLG3  (Si et al., 2016), GW7 , qGW8 (Wang et al., 2012), and GS9 (Zhao et al., 2018a), to explain the long grain of QG10. A total of five grain size elite alleles (qTGW2 Nipponbare , qTGW3 Nanyangzhan , GW5 IR24 , GW6 Suyunuo , and qGW8 Basmati385 ) were identified controlling grain size in QG10 (Figure 3). The qTGW2 allele in QG10 was identical to Nipponbare having the key variants (G/A) at -1818 bp in the promoter region ( Figure 3A). The qTGW3 allele in QG10 had the key splicing-site mutation as Nanyangzhan, which was a long-grain indica rice ( Figure 3B). The GW5 allele in QG10 was found without the critical loss of the 1,212 bp deletion mutation as IR24 a long narrow grain indica rice ( Figure 3C). The GW6 allele in QG10 had the key mutation (6 bp) as Suyunuo, which was a wider grain indica rice ( Figure 3D). The qGW8 allele in QG10 had the five variants as Basmati385, a long narrow grain indica rice ( Figure 3E). Tracing the origin of these genes, it was found that qTGW3 (Nanyangzhan type), GW5 (IR24 type), GW6 (Suyunuo type), and qGW8 (Basmati385 type) belonged to indica subspecies, while qTGW2 (Nipponbare type) were mainly derived from japonica subspecies. In addition, these genes controlling grain type are all rare genotypes in japonica rice and have important application value in long grain type japonica rice breeding.

Sequence variants of the genes controlling cold tolerance
Cold tolerance is the key agricultural trait controlling rice production and geographic distribution. We investigated the sequence variations in 10 cloned genes controlling cold tolerance, including bZIP73 ( (Fujino et al., 2008), and qPSR10 (Xiao et al., 2018), to explain the high cold tolerance of QG10. A total of four elite alleles (COLD1 N i p p o n b a r e , bZIP73 N i p p o n b a r e , CTB4a Kunmingxiaobaigu , and CTB2 Kunmingxiaobaigu ) were identified as controlling cold tolerance in QG10 (Figure 4). The COLD1 allele in QG10 was identical to Nipponbare to have the key SNP in the fourth exon region ( Figure 4A). The bZIP73 allele in QG10 was found having the key SNP mutation (G/A) as Nipponbare ( Figure 4B). The CTB4a allele in QG10 was found to have the ten mutations as Kunmingxiaobaigu, which was a cold tolerance variety from Yunnan Province ( Figure 4C). The CTB2 allele in QG10 was found also have the ten key SNPs as Kunmingxiaobaigu ( Figure 4D). The COLD1 (Nipponbare type) and bZIP73 (Nipponbare type) all belong to japonica subspecies. The CTB4a (Kunmingxiaobaigu type) and CTB2 (Kunmingxiaobaigu type) were all rare alleles in Northeast japonica rice and have important application value in Northeast japonica rice breeding.

Sequence variants of the genes controlling early heading
The heading date is one of the most important factors determining rice distribution and the final yield. We investigated the sequence variations in 11 cloned genes related to early heading under long-day conditions, including DTH7 (Gao et al., 2014), Ghd7 (Xue et al., 2008), Ghd8 (Yan et al., 2011), Ehd1 (Doi et al., 2004), Ehd3 (Matsubara et al., 2011), Ehd4 (Gao et al., 2013), Hd1 (Yano et al., 2000), Hd3a (Kojima et al., 2002), Hd6 (Takahashi  (Hori et al., 2013), and Hd17 (Matsubara et al., 2012). Among these heading date genes, only the DTH7, Ghd7, and Hd1 haplotypes were found to have non-functional alleles. The DTH7 allele in QG10 was found to have the three mutations as Kitaake, which originated at the northern limit of rice cultivation in Hokkaido, Japan ( Figure 5A). Kitaake is reported insensitive to day length, short in stature, and completes its life cycle in about 9 weeks (Jain et al., 2019). The Ghd7 allele in QG10 was found having the critical mutations as Hejiang19, which is an early-maturity rice variety in Heilongjiang Province ( Figure 5B). The Hd1 allele in QG10 was found to have the non-functional allele as Longgeng31, which is the major plant rice variety in Heilongjiang Province ( Figure 5C). These results indicated that the heading gene combinations of Hd1, DTH7, and Ghd7 determined the early heading in QG10 in the northernmost province of China.

Sequence variants of the genes controlling disease resistance
Blast is one of the most devastating rice diseases in Heilongjiang Province. We investigated the sequence variations in 14 cloned rice blast-resistant genes, including Pi5 (Lee et al., 2009), Pi21 (Fukuoka et al., 2009), Pi36 (Liu et al., 2007, Pi37 , Pi54 (Sharma et al., 2010), Pi56 , Pia (Okuyama et al., 2011), Pish (Takahashi et al., 2010), Pit (Hayashi and Yoshida, 2009), Pita (Bryan    Chen et al., 2018), to explain the blast resistance. Finally, only two blast resistance genes, Pia and Pid4, were found in QG10. The Pia allele in QG10 was found to have the resistant genotype as Akihikari, which encodes a nucleotide-binding site (NBS) and a C-terminal leucine-rich repeat (LRR) domain protein ( Figure 6A). The Pid4 allele in QG10 was found to have the resistant genotype as Digu, which encodes a coiled-coil nucleotide-binding site leucine-rich repeat (CC-NBS-LRR) protein ( Figure 6B). The Pia (Akihikari type) was the major blast-resistant gene in Northeast China. The Pid4 (Digu type) was a rare allele in japonica rice and has important application value in Northeast japonica rice breeding. Rice stripe virus (RSV), an RNA virus belonging to the genus Tenuivirus and transmitted by small brown planthoppers, causes one of the most destructive rice diseases (Wang et al., 2014). RSV has become more and more serious in Heilongjiang province in recent years. But, almost the majority of japonica varieties cultivated in Heilongjiang are highly susceptible to RSV (Wang et al., 2014). The STV11 was the first cloned resistant gene of RSV, which encodes a sulfotransferase (OsSOT1) catalyzing the conversion of salicylic acid (SA) into sulphonated SA (SSA) (Wang et al., 2014). The STV11 allele in QG10 was found to have the resistant genotype as Kasalath, which is a highresistance indica landrace ( Figure 6C). The STV11 QG10 was a useful resistant gene in japonica rice breeding.
Only the SCM3 allele in QG10 was found to have the key allele as Chugoku117, which is a stronger culms rice variety in Japan ( Figure 7B). There was no elite grain number gene found in QG10. NRT1.1B is a key gene controlling the nitrogen-use efficiency (NUE) in rice . The NRT1.1B allele in QG10 was found to have the indica variation, which has higher nitrate absorption activity ( Figure 7C).

Discussion
Indica genome introgression and large SVs were found in Qigeng10 Japonica/Geng and Indica/Xian are the two major subspecies of Asian cultivated rice (Zhang et al., 2016a). Owing to long-term differentiation and adaptation, both Indica and Japonica rice contain many favorable genes. Therefore, combining the favorable genes of the two subspecies has great value for creating genotypes with greater yield potential, stronger stress resistance, and better quality (Gu, 2010). Over the past 50 years, the combination of plant ideotypes and favorable vigor through hybridization between indica and japonica rice has greatly contributed to yield improvements in modern japonica rice in Northeast China (Tang and Chen, 2021). In recent years, a series of high-yielding and good-quality japonica cultivars have been obtained from hybridization of Indica/Japonica and the cultivation area of them was more than 4 million ha in Northeast China (Cui et al., 2022). The new fragrant early japonica rice cultivar QG10 was derived from a cross between 'Wuyoudao4 and Suigeng4', which were all derived from the hybridization of Indica/Japonica. In recently, Wang et al. (2023) chose six interrelated modern Chinese temperate japonica varieties and six related Japanese japonica varieties to investigate genome enhancement in temperate japonica varieties during modern breeding. They found many large SVs in Zhonghua11 (ZH11), Liaogeng5 (LG5), and Daohuaxiang2 (DHX2/WYD4). These largefragment in the same location introgression from indica were also found in QG10 on chromosomes 4, 8, and 10 ( Figure 2B). Several The allelic information of sequence variants in COLD1 (A), bZIP73 (B), CTB4a (C), and CTB2 (D) controlling cold tolerance in rice. Jiang et al. 10.3389/fpls.2023.1134308 Frontiers in Plant Science frontiersin.org The allelic information of sequence variants in DTH7 (A), Ghd7 (B), and Hd1 (C) controlling heading date in rice. The allelic information of sequence variants in Pia (A), Pid4 (B), and STV11 (C) controlling disease resistance in rice. Jiang et al. 10.3389/fpls.2023.1134308 indica superior alleles including qTGW3 Nanyangzhan , GW5 IR24 , GW6 Suyunuo , qGW8 Basmati385 , Pid4 Digu , STV11 Kasalath , NRT1.1B IR24 , and badh2-E2 were also be found in QG10. This information indicated that the indica genome introgression was common in the modern temperate japonica rice breeding in Northeast China. Superior alleles were found in Qigeng10 and had important value for breeding in Northeast China Superior alleles were found in Qigeng10 and had important value for breeding in Northeast China Greater yield potential, stronger stress resistance, and better quality (longer grain and fragrant) are key agronomy traits that directly influence the market price of rice. Consumers in East Asia, including North China, Japan, and Korea tend to prefer longer fragrant japonica rice (Lu et al., 2022). So, the longer fragrant japonica rice from Northeast China, represented by Daohuaxiang2 (DHX2/WYD4), is the most famous rice in the Chinese market. QG10 was derived from DHX2 and solved some defects of DHX2 including poor lodging resistance, lack of cold tolerance, weak blast resistance, and late maturity. Therefore, the construction of a highquality genome of 'QG10' is essential for further improvement of this cultivar or its progenies, as well as accelerating the process of fragrant japonica rice breeding, by providing genomic resources that could be directly applied to fragrant japonica rice cultivars. In this study, we found five superior alleles (qTGW2 Nipponbare , qTGW3 Nanyangzhan , GW5 IR24 , GW6 Suyunuo , and qGW8 Basmati385 ) controlling long grain size in QG10. To compare the phenotype of different gene haplotypes in rice germplasm, we investigated the grain shape traits and days to heading of 3k cultivars in the website (https://www.rmbreeding.cn/) . The results showed that the functional haplotype of QG10 controlling longer and slider grain (Figures 8A-E) and early heading (Figure 8f-h). Most of them was belong to the rare alleles for controlling longer grain and were less application in rice breeding in Northeast China. The blast resistant alleles (Pid4 Digu ), RSV allele (STV11 Kasalath ), and NUE allele (NRT1.1B IR24 ) were also belong to the rare alleles for rice breeding in Northeast China. In the future, we will develop molecular assisted markers for the improvement of japonica rice varieties in Northeast China, and expanded the gene pool of japonica rice in Northeast China.

Conclusions
In this study, we present chromosome-level genome assembly of an early-matured aromatic long-grain japonica rice variety Qigeng10 by using a combination of Nanopore and Hi-C platforms. The total assembly size is 378.31Mb with an N50 length of 30.55 Mb. A total of 18 superior haplotypes including five long-grain alleles (qTGW2 Nipponbare , qTGW3 Nanyangzhan , GW5 IR24 , GW6 Suyunuo , and qGW8 Basmati385 ), four cold tolerant alleles (COLD1 Nipponbare , bZIP73 Nipponbare , CTB4a Kunmingxiaobaigu , and CTB2 Kunmingxiaobaigu ), The allelic sequence variants of BADH2 (A), SCM3 (B), and NRT1.1B (C) in rice. Jiang et al. 10.3389/fpls.2023.1134308 Frontiers in Plant Science frontiersin.org three non-functional heading date alleles (DTH7 Kitaake , Ghd7 Hejiang19 , and Hd1 Longgeng31 ), two blast resistant alleles (Pia Akihikari and Pid4 Digu ), a rice stripe virus resistant allele STV11 Kasalath , a higher nitrate absorption allele NRT1.1B IR24 , a lodging resistant allele SCM3 Chugoku117 , and the typical aromatic allele badh2-E2, were identified in QG10. This information will accelerate the process of fragrant japonica rice breeding in Northeast China, by providing genomic resources that could be directly applied to fragrant japonica rice cultivars or development of molecular assisted markers for the improvement of japonica rice varieties.

Data availability statement
The data presented in the study are deposited in the Data Center of Beijing Institute of Genomics(Big) repository, accession number WGS029943 (PRJCA013131; SAMC988458).

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher. Comparison of grain shape traits and days to heading between different gene haplotypes in 3k panel. Jiang et al. 10.3389/fpls.2023.1134308 Frontiers in Plant Science frontiersin.org