Chromosome-Scale Assembly and Characterization of the Albino Northern Snakehead, Channa argus var. (Teleostei: Channidae) Genome

Northern snakehead, Channa argus (C. argus), is an important economic and ecological fish species. The wild population of the species was sharply declined in the last decade. A high-quality reference genome could lay a solid foundation for the genetic and conservation studies for C. argus. In this work, we report a chromosomal genome assembly with PacBio and Hi-C technology using the albino northern snakehead, a color variety of C. argus. A 644.1-Mb genome with 24 chromosomes was obtained with a contig and scaffold N50 of 11.78 and 27.8 Mb, respectively. We inferred that C. argus diverged from A. testudineus around 85.6 million years ago. 514 expanded gene families and 214 positively selected genes were identified in the C. argus genome. The chromosome-level genome provides a valuable high-quality genomic resource for population, as well as genetic and evolutionary studies for C. argus and other species in Channidae.


INTRODUCTION
Northern snakehead fish, Channa argus (C. argus), belonging to Osteichthyes, Perciformes, and Channidae, is an important ecological and economic fish species in tropical and subtropical Asia and Africa (Cheng and Zheng, 1987;Courtenay and Williams, 2004). C. argus possesses many excellent characteristics for its roles in aquaculture including strong fecundity, fast growth, anti-hypoxia, delicious taste, exquisite meat quality, less bone spurs, and high nutritional value (Glass et al., 1986;Chen and Yang, 2013). In addition, C. argus has been reported to have medicinal values such as removing blood stasis, generating muscle and blood, nourishing, and conditioning and was usually used as a primary food of daily tonic and wound healing (Yuan et al., 2005;Qin and Jiang, 2010). Moreover, C. argus has excellent hypoxia tolerance due to the upper gill organ, enabling long-distance transportation for the species . C. argus has become a widely farmed aquatic species in China, leading to a rapid development in its breeding industry in recent years (Li et al., 2007;Xiao et al., 2015).
The albino northern snakehead (C. argus var.) is a member of the Channidae family of the Perciformes and is distributed mainly in the middle-lower reaches of the Jialing River Basin in Sichuan Province, China (Shi et al., 1980). According to records, C. argus var. was originally considered to be a subspecies of the Northern snakehead C. argus (Wang et al., 1992). Subsequently, a large amount of molecular biological evidence proved that C. argus var. was not a subspecies of C. argus but an albino population (Li et al., 2016;Zhou et al., 2017). Meanwhile, because of its freshness, high nutritional value (e.g., high polyunsaturated fatty acid omega-6 levels), and potential ornamental value (e.g., all white of body) compared to those of the C. argus (Figure 1), it is a valuable economic and ornamental fish in China (Zou et al., 2017;Zhou et al., 2018). The average market price of C. argus var. is about 3-4 times higher than that of C. argus (Zhou et al., 2018). However, due to the environmental deterioration and overfishing, the wild populations of the species were declining in the last decade (Zhou et al., 2018). Additionally, the low survival rate and aberration rate in larval breeding seriously limited the development of intensive aquaculture of C. argus var.
Genome is one of the most important genetic resources in the ecological and breeding studies for species, especially for research aiming at improving economic traits for farming animals. In recent years, the genomes of many fish species have been successfully reported, including Takifugu rubripes (Aparicio et al., 2002), Oryzias latipes (Kasahara et al., 2007), Danio rerio (Howe et al., 2013), Cyprinus carpio (Xu et al., 2014), Litopenaeus vannamei (Zhang et al., 2019), Oxygymnocypris stewartii (Liu et al., 2019), Datnioides undecimradiatus (Sun et al., 2020), and Platycephalus sp.1 (Xu et al., 2021). A previous study has shown that the genomic application of a reference genome largely depends on the continuity and completeness quality of genome sequences (Xiao et al., 2020). The genome of C. argus has been reported in 2017 by Jian Xu (Xu et al., 2017), which provides basic genomic data for studies of the species. However, the public genome of C. argus was assembled using short reads from nextgeneration sequencing technologies and was highly fragmented with the contig N50 length of 81.4 kb (Xu et al., 2017). More importantly, the genome was not assembled into the chromosomal level, making the genome not being able to provide sufficient genomic information for the following chromosome evolution and fine mapping of functional genes for important economic traits (Dan et al., 2018). There is a great demand for a chromosome-level high-quality reference genome of C. argus to facilitate and prompt evolutionary and conservation studies and functional gene mapping of the critical economic traits for the species.
Here, for the first time, we presented a high-quality chromosomal genome assembly for the albino northern snakehead (C. argus var.) with a combined strategy of Illumina, PacBio, and Hi-C technology. The contig and scaffold N50 length reached 11.78 and 27.8 Mb, respectively. More than 95.8% of the sequence reads were assembled into 24 chromosomes, demonstrating the outstanding completeness and sequence continuity of the reference genome. 22,593 protein-coding genes were predicted in the assembled genome, and more than 91.9% of those genes were successfully functionally annotated. We believe that the high-quality chromosomal genome would provide a valuable reference not only for the genomic dissection of the phenotypic variation in the species but also for the evolutionary investigation of Channidae family among teleosts.

Sample Collection
A female individual of C. argus var. was reared in Neijiang Fish Farm (Neijiang City, Sichuan Province, China) and was used for the genome sequencing and assembly ( Figure 1). A total of 12 tissues, including white muscle, skin, spleen, liver, intestinal, ovary, swim bladder, kidney, heart, brain, eye, and gill, were collected and then quickly frozen and stored in liquid nitrogen for 6 h. Of these tissues, the white muscle was used for DNA sequencing for genome assembly and all tissues were used for transcriptome sequencing.

DNA and RNA Sequencing
To construct a DNA sequencing library, we extracted the genomic DNA from the muscle tissue of a female individual using the standard phenol/chloroform extraction method. The quality of genomic DNA molecules was checked, and we required that the main band of extracted DNA was around 20 kb in the agarose gel electrophoresis experiment, and the DNA spectrophotometer ratio (SP) 260/280 was larger than 1.8. Subsequently, short-read (insert size: 250 bp) and long-read (insert size: 20 kb) DNA sequencing libraries were created according to the protocols of manufacturers, the former one for the whole genome sequencing based on the Illumina HiSeq X Ten platform and the latter for the PacBio Sequel platform, respectively.
RNA-seq data can be used to improve the quality of the genome annotation. To include as many tissue-specific expressed transcripts for the analysis as possible, RNAs of all the 12 collected tissues were extracted using the TRIzol Reagent (Invitrogen, Carlsbad, CA, USA). The purified RNA quantity and quality for each tissue were assessed, and we required that the absorbance be larger than 1.7 at 260 nm/280 nm based on the NanoDrop ND-1000 spectrophotometer (LabTech, Hopkinton, MA, USA) and the RIN value was larger than 8.5 on the basis of the 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), respectively. Then, RNA molecules from all the 12 collected tissues were equally mixed before the subsequent transcriptome sequencing with the Illumina HiSeq X Ten platform according to the manufacturer's protocol. Briefly, 3 µg of RNA molecules was used for library construction. After the library construction including purification, fragmentation, cDNA synthesis, adaptor ligation, and fragment selection, the transcripts were sequenced with Illumina HiSeq X Ten (Illumina Inc., San Diego, CA, USA) system using the paired-end 150-bp mode.
Evaluation of the Characters of the C. argus var. Genome The short sequencing data of C. argus var. from the Illumina platform were used for the genome character evaluation using the Kmer-based method . The sequencing data were firstly quality checked and filtered before the analysis. HTQC  was used for the low-quality base/read filtering, and FastQC (http://www.bioinformatics.babraham.ac.uk/projects/ fastqc/) was applied for the quality control. All adapter sequences that reside in reads were removed, and paired-end reads with more than 10% ambiguous bases or with more than 50% low-quality bases (Phred score <5) were filtered. The 17mers were generated from the sequencing data using the Jellyfish package (Marcais and Kingsford, 2011), and the frequency of all 17mers was plotted to illuminate the genome characters.
De Novo Contig Assembly of the C. argus var. Genome Long reads from the whole-genome PacBio sequencing were used for the de novo genome assembly. Falcon 2.1.4 (Chin et al., 2016) was used for the genome assembly with the parameters listed in Supplementary Table S1. To eliminate ineluctable base errors in the assembly, the PacBio long reads and Illumina short reads were used again for the base correction. Firstly, the PacBio long reads were mapped upon the preliminary genome using blasr software (Chaisson and Tesler, 2012) and the alignment results were used for the sequence polish using arrow utility (ARROW in GCpp v1.9.0) (Chin et al., 2013) with the minCoverage of 15. Secondly, two rounds of sequence polish using Illumina short reads were performed using BWA (Abuıń et al., 2015) for read alignment and Pilon 1.23 (Walker et al., 2014) for the base correction.

Chromosome Construction Using Interaction Information From Hi-C Data
The muscle tissue was used for the Hi-C library construction and sequencing. Two micrograms of tissue from the same individual used for the genome assembly was collected. The chromatin cross-linking, lysis, digestion, marking with biotin, ligation, chromatin cross-linking reversal, and DNA fragment collection for Hi-C library construction were performed using the identical experimental process in the previous study (Xu et al., 2018b). The DNA molecules were used for the library construction and sequencing as traditional genome sequencing project using the Illumina HiSeq X Ten platform (Illumina, San Diego, CA, USA).
The interaction frequencies among contigs were estimated from the sequencing data; however, the data analysis was different from that of a traditional whole-genome sequencing project, since abundant chimeric reads could be observed in the Hi-C library sequencing. We first applied an iterative alignment strategy to align the reads to the assembled contigs using Bowtie (Langmead, 2010) with a single end mode. Only read pairs where both ends were uniquely aligned to the contigs were selected for the following study. Then, the interaction frequencies among contigs were evaluated using the hiclib python library (Imakaev et al., 2012). At last, contigs were clustered, ordered, and orientated to restore their relative locations along chromosomes using an agglomerative hierarchical clustering method implemented in Lachesis (no version) (Burton et al., 2013).

Gene Model Prediction and Functional Annotations
Before the protein-coding gene annotation in the C. argus var. genome, both tandem and interspersed repeats were predicted and masked. Tandem Repeat Finder (TRF 4.09) (Benson, 1999) was used to detect tandem repeats in the genome. RepeatMasker 4.1.2 and RepeatProteinMask were used to predict interspersed repeats based on the Repbase database (Jurka et al., 2005). The de novo prediction was applied with RepeatMasker 4.1.2 using the combined library from RepeatModeler 2.0.2a and LTR-FINDER 1.0.7 (Xu and Wang, 2007). All repeat types were merged to eliminate the redundancy.
To predict protein-coding genes in the C. argus var. genome, de novo and homolog-and RNA-seq-based methods were used. De novo prediction was performed with Augustus 3.4.0 (Stanke et al., 2006). For the homolog-based method, protein sequences of Anabas testudineus, Danio rerio, Oryzias latipes, Tetraodon nigroviridis, and Xiphophorus maculatus were downloaded from the Ensembl database release 96 (Flicek et al., 2007) and aligned to the C. argus var. genome with TBLASTN utility (Altschul, 2012), which was processed with GeneWise 2.4.1 (Birney et al., 2004) to obtain the gene models. TopHat 2.1.1 was used to map RNA-seq data upon the C. argus var. genome (Trapnell and Pachter Lsalzberg, 2009), and Cufflinks 2.2.1 was then used to assemble transcripts (Ghosh and Chan, 2016). Packages were used to map RNA-seq data upon the C. argus var. genome and extract gene information. The MAKER 3.01.02 (Cantarel et al., 2008) package was used to merge all gene models from de novo and homolog-and RNA-seq-based methods.
All the final protein-coding genes were searched against NR, TrEMBL, Swissport, and COG databases using BLAST 2.11.0 + utility (Altschul, 2012) with a maximal e-value of 1e-5. Blast2GO 5.2.5 (Conesa et al., 2005) software was used for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation.

Evolutionary Dynamics of the C. argus var. Genome
Coding sequences and corresponding protein sequences for 11 fish species, including Callorhinchus milii, Lepisosteus oculatus, D. rerio, Gadus morhua, X. maculates, O. latipes, Anabas testudineus, Gasterosteus aculeatus, Larimichthys crocea, T. nigroviridis, and Takifugu rubripes, were downloaded from the Ensembl database. The longest transcript and encoded protein sequence for each gene locus was selected, and the OrthoMCL 2.0.9 pipeline (Li et al., 2003) with default settings was used to identify their relationships within and among species. Then, the protein sequences of one-to-one ortholog genes were aligned with muscle (Edgar, 2004) and were converted into nucleotide sequences using pal2nal (Suyama et al., 2006). Hypervariable regions of the alignments were removed with Gblocks 0.91b (Castresana, 2000) with default settings, and the remaining sequences were concatenated and fed into RAxML (Stamatakis, 2014) to reconstruct the relationships among these species. One hundred times of rapid bootstrap (Stamatakis et al., 2008) resampling were performed to access the robustness of the topology. Based on the topology and the alignment matrix, the divergence times among these species were estimated using MCMCTREE included in the PAML 4.9 package software (Yang, 2007) with the calibration time obtained by consulting the TimeTree database (Kumar et al., 2017 Bie et al., 2006). Furthermore, candidate genes probably subject to positive selection were identified using CODEML by comparing the differences of likelihood values between model A with two different settings (fix_omega = 1 omega = 1 vs. fix_omega = 0 omega = 1.5) and the chi 2 distribution.

Specific Gene Identification in the C. argus Genome
The CDS sequences from the C. argus var. and C. argus genome were blasted against each other by BLASTN (Altschul, 2012). Genes from the two genomes with an alignment ratio larger than 80% were considered as shared genes. Genes with any hit were recognized as non-hit genes. To identify genome-specific genes, non-hit genes from the C. argus genome were aligned to the C. argus var. genome using exonerate (https://github.com/ nathanweeks/exonerate) with parameters: -model est2genome -percent 80 -showtargetgff 1. If a gene had no hit on the C. argus var. genome, the gene was identified as C. argus specific genes. In the identical method, C. argus var. specific genes were also identified.

De Novo Assembly of the Genome
Based on the Illumina HiSeq X Ten platform, a total of 42.99 Gb cleaned data with~67× coverage of estimated genome size was generated ( Table 1). Using the high-quality whole-genome sequencing data, we applied the Kmer-based method for the genome size, heterozygosity, and repeat content estimation. The genome size was estimated using the Kmer-based method, and results are shown in Figure 2. Because 17mers with extremely low frequency most likely resulted from base errors in the PCR or sequencing, all 17mers with the frequency lower than 5 were excluded from the genome character estimation. As shown in Figure 2, we estimated that the genome size of C. argus var. was 668 Mb. We observed low heterozygosity in the Kmer plot, resulting in the whole-genome heterozygosity of 0.096%. The heterozygosity of C. argus var. was relatively smaller than for  (Xu et al., 2018a), even for other aquaculture species generated from gynogenesis (Wang et al., 2015), implying that the genetic diversity of the C. argus var. might be rather low in population.
To de novo assemble the C. argus var. genome, we constructed and sequenced a 20-kb DNA library using the PacBio Sequel platform and obtained 4,650,237 subreads totaling 52.85 Gb and representing~83× of estimated genome size ( Table 1). The N50 length of the subreads was 18 kb with the maximal subread length of 94 kb (Supplementary Figure S1). After the preliminary assembly using Falcon 2.1.4, 640.6-Mb genomes with 749 contigs were obtained. The N50 and maximal length of contigs were 11.91 and 27.5 Mb, respectively (Supplementary Table S2). After rounds of polishing based on the long reads used for de novo assembly and short reads used for survey, the final genome was 644.1 Mb of 749 contigs with an N50 length of 11.98 Mb (Supplementary Table S2). The completeness of the assembled genome was validated by Benchmarking Universal Single-Copy Orthologs 3.0 (BUSCO 3.0) analysis using BUSCO v3.0 with the actinopterygii_odb9 database. As a result, 4,458 (97.2%) of the 4,584 BUSCO genes were completely identified in the genome with 4,334 (94.5%) single-copy and 124 (2.7%) multi-copy genes ( Table 2), suggesting excellent completeness for the genome assembly.

Scaffolding of the Genome
Although we obtained a genome with less than 1,000 contigs (Table 3), the genome sequences were still fragmented since the karyotype of C. argus was 2n = 48 according to the previous studies (Song et al., 2012;Zhong et al., 2016). The relative position and orientation along chromosomes of those contigs were crucial for the comparative genomic analysis, especially for the chromosome evolution studies. Many traditional scaffolding strategies were developed and reported to anchor contigs into chromosomes, such as genetic mapping, physical mapping, BAC sequencing, and large insert-size mate-pair sequencing; however, those methods were time-/labor-consuming and costly. In this work, we applied the Hi-C technique to assemble the first chromosome assembly of C. argus var.
From the Hi-C sequencing, we obtained more than 78 Gb of sequencing data, covering 122× of the C. argus var. genome ( Table 1). As a result, a 644.1-Mb genome with 562 sequences and a contig/scaffold N50 length of 11.78/27.76 Mb were obtained ( Table 3). It is worth noting that gaps among contigs were filled with 100-bp Ns in the genome; therefore, the gap  lengths in the genome did not represent the real or estimated length. The total length of the top 24 longest sequences, representing 24 chromosomes of the C. argus var., was 617.6 Mb, covering more than 95.8% of contigs on the base level (Supplementary Figure S2). A total of 538 genome sequences were unplaced upon chromosome after the Hi-C analysis. The contig N50 length of unplaced sequences was 41.7 kb, which was significantly shorter than that of the genome level. We attributed the unplacement of those sequences to the insufficiency of interaction information with other contigs due to their short length.

Annotation of the Genome
More than 20% of the genomes were predicted as repetitive elements, and long interspersed nuclear elements (LINE) represented the most abundant repeat type in the genome ( Table 4). For assisting the annotation of gene structures, about 10.8 Gb of transcriptome data ( Table 1) representing almost all of the expressed genes from the 12 collected tissues was generated. Together with the other two annotation strategies including the de novo and homolog-based methods, we obtained 22,593 protein-coding genes in the C. argus var. genome at last (Table 5), of which 96.61% could be supported by transcripts with at least a 50% overlap. The annotation of gene structures was quality controlled by comparing to the closely related species. As shown in Figure 3, the gene structures of C. argus var. was comparable to those of A. testudineus, D. rerio, O. latipes, T. nigroviridis, and X. maculatus. Based on the chromosome assembly, the density of genes, repeat types, and GC contents were plotted along the chromosomes as in Figure  S5. Generally, the distributions of repeats and genes are inversely correlated. Of all the predicted protein-coding genes, more than 91.9% were annotated to at least one public database ( Table 6).
We also compared the coding sequences of C. argus var. with those of C. argus and calculated the ks values for each gene. The median value of the ks values is 0.008, which is much larger than that of the distance of the COI gene but is still within the range of species (Zhou et al., 2015).
The Improvement of the C. argus var.

Genome Compared to the Previous Genome
Although previous studies have reported one version of genome for C. argus, we found that our recent genome exhibited significant improvements on both continuity and completeness. Firstly, the published C. argus was assembled by short reads from nextgeneration sequencing technology (Xu et al., 2017), but our genome was assembled using long reads from the PacBio sequencing platform. The contig N50 length in our work (11.98 Mb) was more than 140 times higher than the previous version (81.4 kb), while the contig number was about 40 times smaller than that of the previous version (Table 3), indicating the remarkable updates on the reference genome continuity. As far as we know, the contig N50 length of 11.98 Mb for the new genome surpassed the majority teleost genomes, including model teleost species such as zebrafish and medaka. Secondly, the BUSCO comparison between our and previous genomes showed that 97.2% of BUSCO genes were identified in our genome, but  only 82.9% were detected in the old genome, suggesting that our genome exhibited higher completeness. This might also explain that more protein-coding genes were predicated in this work. The continuity and completeness are crucial for a reference genome, since the fragmentation of the genome might break the continuity of the gene sequence in the genome and spoil the genome comparison among species, leading to incompleteness and inaccuracy alignments. Therefore, the merits of our genome make the new reference more suitable for gene and genome sequence analyses. More importantly, our genome sequences were anchored into chromosomes by the Hi-C technique in this work, providing essential reference genomes for the genome evolution on chromosome levels. Similar to the contigs, the scaffold number of C. argus var. in this work was much lower whereas the contig N50 size is much bigger than that of C. argus (Table 3). With the development of functional genomics and the increasing research interests for the economic species, a large amount of genetic analyses for Channa argus, such as quantitative trait locus (QTL) and genome-wide association study (GWAS), will be performed. Those genetic analyses highly rely on the chromosome assembly.

Evolutionary Analyses of the Genome
Using the OrthoMCL 2.0.9 pipeline, a total of 3,166 single-copy ortholog groups were detected (Supplementary Figure S3). After alignment and removal of gapped regions, a concatenated alignment matrix with a length of 1,877,704 bps was created based on these genes. The recovered phylogeny showed that C. argus var. and another species belonging to Perciformes, A. testudineus, are sister species with high confidence (Figure 4). The phylogenetic tree of species in this work was consistent with previous study (Xu et al., 2017). Divergence time estimation showed that the two sister species diverged about 85.6 million years ago (Figure 4). The dynamics of members of gene families that reside in the genome may be the result of natural selection. A total of 514 and 2,664 gene families were probably subject to expansions and contractions within the genome (Supplementary Figure S4). The top enriched pathways of these expanded gene families are olfactory transduction,   Table S3). Another aspect of the impact of natural selection is the substitution of amino acids. Generally, the change of encoded amino acids is harmful to the species. However, some non-synonymous substitutions may increase the fitness of the species, especially when the living environment changes, and genes harboring these changes were termed positively selected genes (PSG). Using the branch-site model, a total of 214 genes were identified to be candidate PSGs in C. argus var., and they may be pivotal for the survival of the albino individuals. Functional analysis showed that genes participate in the pathways of non-homologous end-joining and protein export, and basal transcription factors were most significantly enriched (Supplementary Table S4). Especially, the genes participate in the pathways. Homologous recombination, mismatch repair, and DNA replication were also significantly enriched. These genes including rad50, rad51d, brcc36, msh6, and dna2 may protect the albino fish from ultraviolet radiation.
Genomic Comparison of C. argus var. With C. argus Using sequence blast among genes from the C. argus var. and C. argus genomes, we found 425 and 136 specific genes for C. argus and C. argus var., respectively (Supplementary Tables S5, S6).
As the top two groups of gene annotated in KEGG for C. argus, 26 immune-system and 22 signal transduction genes were identified ( Figure 5A). To reveal the possible contribution of genome-specific genes for C. argus and C. argus var., functions of specific genes were enriched for each genome. We found that C. argus-specific genes were enriched on immunologic processes, such as antigen processing and presentation, suggesting that C. argus and C. argus var. might respond differently to pathogen infection ( Figure 5B). Meanwhile, C. argus-specific genes were also significantly enriched on fatty acid metabolism, such as linoleic acid metabolism ( Figure 5B). The result provided useful hints for the following C. argus and C. argus var. phenotype comparison. Interestingly, we found that adcy5 (adenylate cyclase 5) was specifically identified in the C. argus genome ( Figure 5C). Previous studies have shown that adcy5 is required for melanophore and pigmentation patterns for fish (Kottler et al., 2015); therefore, the absence of gene of adcy5 might be related to the albinism of C. argus, which needs further validation in the following investigations.

CONCLUSION
On the basis of the long-read sequencing and Hi-C scaffolding technology, we de novo assembled a nearly chromosomal-level genome for C. argus var. The continuity and completeness of the newly assembled genome were significantly improved compared to the former assembly based on next-generation sequencing technology. We also annotated the genome and performed comparative analyses of the genome with other fish species. Phylogenetic analyses and divergence time estimations showed that C. argus var. and A. testudineus, the fish closest to have whole genome sequences publicly available at present, diverged about 85.6 million years ago. A number of expanded gene families and positively selected genes that reside in the C. argus var. genome were also detected, and these genes may be pivotal during the environmental adaptations of these albino individuals. Using comparative genomics, the putative genomespecific genes for C. argus var. and C. argus were detected and functionally analyzed. Based on our result, adcy5 (adenylate cyclase 5) was absent in the C. argus var. genome, which might be related to the albinism of C. argus var. Further investigations of these genes may provide insights into the molecular mechanisms of the albinism for fish, and even for other species including human. The high-quality genome of the albino fish C.
argus var. provides a valuable resource for understanding the evolution events during fish evolution, especially for the understanding of fish albinism and their adaptation to the environment.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found as follows: NCBI [accession: PRJNA522012].

ETHICS STATEMENT
The animal study was reviewed and approved by the Animal Ethics Committee, Southwest University.

AUTHOR CONTRIBUTIONS
CZho, SX, and ML conceived and designed the study. XD, JW, JS, HY, and GL collected the samples. HL, YZo, GL, and GK performed the DNA sequencing and Hi-C experiments. HL and YZh performed the RNA sequencing. YZh, DY, and SX estimated the genome size, assembled the genome, and assessed the assembly quality. CZho, SX, and CZha performed the genome annotation and functional genomic analysis. CZho, SX, and ML wrote the manuscript. All authors contributed to the article and approved the submitted version.

839225/full#supplementary-material
Supplementary Figure 1 | The length distribution of subreads generated from the PacBio sequencing platform. The peak around 20 kb represented the insert length during the sequencing library construction.
Supplementary Figure 2 | The interaction frequency matrix among contigs generated from the Hi-C sequencing data. The interaction strength was colored by the logarithm of the contact density from red (high) to white (low).