Chromosome Genome Assembly of the Leopard Coral Grouper (Plectropomus leopardus) With Nanopore and Hi-C Sequencing Data

Citation: Wang Y, Wen X, Zhang X, Fu S, Liu J, Tan W, Luo M, Liu L, Huang H, You X, Luo J and Chen F (2020) Chromosome Genome Assembly of the Leopard Coral Grouper (Plectropomus leopardus) With Nanopore and Hi-C Sequencing Data. Front. Genet. 11:876. doi: 10.3389/fgene.2020.00876 Chromosome Genome Assembly of the Leopard Coral Grouper (Plectropomus leopardus) With Nanopore and Hi-C Sequencing Data


INTRODUCTION
As the storehouse of life information, the genome of an organism harbours all of its biological aspects and evolutionary history. Research conducted at the genomic level has become more common, providing important breakthroughs for the comprehensive interpretation of species. In this respect, numerous, large populations of fish species live in the diverse habitats worldwide, and their genome information presents a valuable genetic resource for fisheries. Exploring the massive genetic information contained in the genomes of fishes can not only reveal the adaptive mechanisms of these organisms to various aquatic habitats, but also help to clarify the gene regulatory networks and mechanisms of the economically relevant traits and important life history phenomena.
Over the past decade, researchers have revealed much fish genome information and associated characteristics (You et al., 2020). For example, whole genome sequencing of Atlantic cod (Gadus morhua) revealed its special immune mechanism (Star et al., 2011), and likewise demonstrated the doubling mechanism of the Atlantic salmon (Salmo salar) genome (Lien et al., 2016). Analysis of the Paralichthys olivaceus genome shows that retinoic acid plays an important role in its eye movement and metamorphic development, achieved via the double antagonistic regulation of thyroxine and retinoic acid (Shao et al., 2017). In constructing the whole genome fine map of channel catfish (Ietalurus punetaus), Liu et al. (2016) uncovered the mechanism of its scale formation, and a study of the whole genome of Leuciscus waleckii elucidated its alkaline environment adaptation mechanism (Xu et al., 2017), to name a few impressive cases. Besides providing insight to molecular mechanisms underpinning biological characteristics, decoding genome information of fish could be used to lay a sound theoretical foundation for distinguishing the genomic location of key economic traits. For example, based on genome-wide association analysis, disease resistance characters of channel catfish were mapped (Geng et al., 2015), and the SNP (single nucleotide polymorphism) loci related to fat content characters of carp were found by GWAS (Genome-Wide Association Studies) analysis (Zheng et al., 2016). In terms of their breeding, genomic selection and breeding technologies on growth and disease resistance, respectively, have been carried out for economic fish species such as Atlantic salmon (Salmo salar) (Ødegård et al., 2014), rainbow trout (Oncorhynchus mykiss) (Vallejo et al., 2016) and European sea bass (Dicen trarchus labrax) (Palaiokostas et al., 2018). In sum, harnessing genomic information can provide an efficient platform for the in-depth study of the biological and economic characteristics of fish.
The leopard coral grouper, Plectropomus leopardus, belongs to the Serranidae family of Perciformes (Morris et al., 2000). It is an important commercial marine fish, being both delicious as sea food and colourful as an aquarium fish (Greenfiel, 2002). This species, due to the high economic price it commands, has been overfished and is now considered under threat by the International Union for Conservation of Nature (IUCN) (Morris et al., 2000). Currently, the genetic resources of the fish are still scarcely known, which greatly hinders both the study and conservation of this species . Like other coral reef fishes, the leopard coral grouper is capable of displaying a variety of body colours (Wu et al., 2016), which can change rapidly in response to light, food, FIGURE 1 | The pipelines used for chromosome-level genome assembly of the leopard coral grouper fish. disease and other stresses (Kingsford, 1992;Wang et al., 2015). In our view, it is a perfect representative model for studying the genetic mechanism of body colouring in coral reef fishes. Moreover, the leopard coral grouper can be used as a material for better understanding the mechanism of melanoma (Lerebours et al., 2016), and for gauging the impact of global warming on coral reef ecosystems (Messmer et al., 2017). The decoding of P. leopardus's genome information could yield insight into its ecological significance and accelerate its genetic breeding applications.
In this study, we provide the chromosome-level genome assembly of leopard coral grouper by using Nanopore sequencing and high-throughput chromosome conformation capture (Hi-C) technologies. Our intent is to illustrate and decipher the genome information of a leopard coral grouper and lay a theoretical foundation for the analysis of its body-colour mechanism. This genome resource will be useful for the future conservation, molecular breeding, and population genetics of the leopard coral grouper.

Sample Collection, Library Construction, and Sequencing
We collected a female leopard coral grouper from the Qionghai Breeding Base of the Hainan Academy of Ocean and Fisheries Sciences, in Qionghai, China.
To extract DNA from its muscle tissue and blood, a DNA Extraction Kit was used following the manufacturer's protocols. Both the quantity and quality of DNA were determined using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA).
Two paired-end libraries (insert sizes of 500 and 800 bp) were constructed according to standard Illumina procedures. These libraries were sequenced using the HiSeq 2,500 platform (Illumina, San Diego, CA, USA) with the PE 150 bp model. The raw data had any adapters and low-quality reads removed by SOAPfilter (Luo et al., 2012). All the ensuing clean reads were then applied to estimate the genome size of the leopard coral grouper through a k-mer analysis, done in the Genome Characteristics Estimation (GCE) software (Liu et al., 2013).
For each Nanopore library, the gDNA was size-selected (10-50 kb) with a Blue Pippin system (Sage Science, USA) and processed using the Ligation Sequencing 1D kit (SQKLSK109, Oxford Nanopore Technologies, UK) according to the manufacturer's instructions. Library construction and sequencing were done by the GridION X5/PromethION sequencer (Oxford Nanopore Technologies, UK) at the Genome Center of Nextomics (Wuhan, China). Base calling was performed on fast 5 files by using the ONT Albacore software (v1.2.6) (Sutton et al., 2019), and only those "passed filter" reads representing data of generally higher quality were used for further analyses.
The Hi-C technique has been used to construct a chromosome-level scaffold (Dudchenko et al., 2017). Our Hi-C library was constructed according to previously reported procedures (Rao et al., 2014). First, we used formaldehyde to fix the conformation of the HMW gDNA. Then, the fixed DNA was sheared with the MboI restriction enzyme; the 5' overhangs induced in that shearing step were then repaired using biotinylated residues. Following the ligation of blunt-end fragments in situ, the isolated DNA was reverse-crosslinked, purified and filtered to remove biotin-containing fragments. Next, DNA fragment end repair, adaptor ligation and polymerase chain reaction (PCR) were performed successively. Finally, the Hi-C raw data were sequenced on the Illumina HiSeq X platform in its 150 bp PE mode.
For the gene annotation of leopard coral grouper genome, transcriptome sequencing was carried out with the muscle tissue of P. leopardus. The total RNA was extracted using a Trizol reagent (Invitrogen, Carlsbad, CA, USA) and purified using an RNeasy Animal Mini Kit (Qiagen, Valencia, CA) according to the manufacturer's instructions. Agilent 2,100 (Agilent Technologies, Palo Alto, CA) was applied to determine the RNA concentration and the RNA integrity number (RIN). The cDNA library was constructed following the manufacturer's instructions (Illumina, San Diego, CA). Finally, the library was sequenced on a HiSeq 2,500 platform (Illumina, San Diego, CA) using paired-end 150 bp reads. The clean data were obtained by removing reads containing adapters and low-quality reads (e.g., N more than 5% and the quality value <20) from the raw data.

Genome Assembly and Chromosome Anchoring
Reads obtained from the Illumina reads, Nanopore sequencing data, and Hi-C reads of libraries were used separately for different assembly stages (Figure 1). Specifically, the Illumina reads, Nanopore sequencing data and Hi-C reads were obtained for genome size estimation, de novo contig assembly, primary scaffolding, genome survey and sequence error-correction, contig assembly and chromosome anchoring, respectively.
To obtain the chromosome-level genome, we constructed an interaction matrix with the cleaned reads from the Hi-C library by using HiC-Pro (v2.8.0, default parameters and LIGATION_SITE=GATC) (Servant et al., 2015); this was mapped to the de novo assembled contigs to construct contacts among the contigs in "bwa" (v0.7.15) with its default parameters (Li and Durbin, 2009). The bam files containing Hi-C linking messages were processed by another round of filtering, in which any reads were removed if they did not map to the assembled  genome within 500 bp from the nearest restriction enzyme site ("juicer" v1.7) (Durand et al., 2016). To assemble the chromosome-level genome based on genomic proximity signals in the Hi-C data, the 3d-dna (v170123) pipeline was used with parameters set to ′ -m haploid -s 0 -c 24 ′ (Sutton et al., 2019).

Genomic Quality Assessment
To evaluate the quality of the assembled genome, its completeness and accuracy were assessed by using short-read mapping and BUSCO (v3.1) (Simão et al., 2015). We aligned Illumina short reads to the genome by using "bwa" (v0.7.15) (Li and Durbin, 2009).

De novo Repeat Sequences and Gene Annotation
The repeat sequences in the leopard coral grouper genome were identified using a combination of homology-based and de novo approaches. First, the homology-based approach was detected repeat sequences, after which the Tandem Repeats Finder (version 4.07) was applied to search for tandem repeats (Benson, 1999

Usage Notes
All contig sequences were assembled into chromosomes by using interaction information from the Hi-C sequencing data. Hence, we used 500 bp to represent the unknown gap sizes among contigs in the obtained chromosome sequences.

Code Availability
The execution of this work involved using many advanced software tools. The settings and parameters for these are provided below. Genome assembly: (1) minimap2+miniasm: all parameters were set to their defaults; (2) racon: all parameters were set to their defaults; (3) pilon: all parameters were set to their defaults; (4) 3d-dna: -m haploid -s 4 -c 24 -j 10.

Library Construction and Sequencing
After removing any redundant and low-quality reads, a total of 38.56 Gb (43.6X) clean reads were left, which included 18.14 and 20.42 Gb of reads from the 500-and 800-bp reads length via Illumina sequencing, respectively. After the k-mer analysis, all the clean reads were estimated to be 945 Mbp using the Genome Characteristics Estimation (GCE) software. A nanopore library was constructed and sequenced using the GridION X5/PromethION sequencer, which yielded 76.93 Gb of final contigs. The high-throughput chromosome conformation capture (Hi-C) library was sequenced by the Illumina HiSeq X10 platform (with 150 bp PE model). This Hi-C sequencing was done for chromosome-level scaffold constructions, yielding a total of 52.75 Gb of paired-end Hi-C reads generated, whose average sequencing coverage was 59.9X (Table 1).

Genome Assembly and Chromosome Anchoring
We obtained an assembled genome of leopard coral grouper containing 1,526 contigs, whose total length was 912.66 Mb. The assembly covered 96.5% of the estimated genome regions. The contig N50 length was 1.42 Mb ( Table 2). Through Hi-C data, 1,346 contigs were found anchored and orientated on 24 chromosomes, including 95.2% of genomic sequences; the results were consistent with previous karyotype analyses of the leopard coral grouper (Gao et al., 2015). The respective lengths of the 24 chromosomes ranged from 18.6 to 43.74 Mb (Tables 2, 3; Note that "at least one database" refers to those genes with at least one hit among the multiple databases searched.

Figure 2).
Compared with other Perciformes fish, the genome assembly of P. leopardus shows a higher level (Table 4).
Using the vertebrata_odb9 database, we found that 92.5% BUSCO genes were completely within the leopard coral grouper genome. We then aligned Illumina short reads to the genome using "bwa" (v0.7.15), finding that more than 94.25% of the reads were aligned to the reference genome, which demonstrated a high mapping ratio for the short-read sequencing data.

Repeat Sequences and Gene Annotation
A total of 315.75 Mb (34.59% of the assembled genome) repeat sequences were thus identified. Among these repeat elements, DNA transposons were more abundant than any other types, accounting for 17.59% (160.56 Mb) ( Table 5).
A total of 24,700 protein-coding genes were predicted. The average number of exons per gene and average gene length were 9.7 and 1,777 bp, respectively ( Table 6). In all, we were able to annotate 24,014 genes in at least one of the databases; hence, in this way, 97.22% of leopard coral grouper genes were functionally annotated ( Table 7).

Comparison With Other Serranidae Fish Genomes
We recently used Lastz (v1.02) to successfully compare the leopard coral grouper genome to the red-spotted grouper (Epinephelus akaara) genome (Ge et al., 2019). Figure 3A summarizes the distribution of SNPs, genes, GC content on 100-kb genomic intervals, as well as the interchromosomal relationships of our assembled leopard coral grouper chromosomes. The genomic sequences of the red-spotted grouper showed evidence of synteny to the leopard coral grouper's genome. We found that the 24 chromosomes of the red-spotted grouper had a clear one-to-one relationship to the leopard coral grouper's chromosomes ( Figure 3B). According to these results, we therefore anticipate that the leopard coral grouper genome will contribute to the study of genome evolution in the Serranidae family members.

DATA AVAILABILITY STATEMENT
The genome assembly sequences and predicted gene were deposited in at CNGB under the accession CNA0007316. The Illumina genomic sequencing reads, Nanopore long reads, Hi-C data, and RNA-seq reads were deposited in CNGB under the accession CNP0000859.

ETHICS STATEMENT
The animal study was reviewed and approved by Institutional Review Board on Bioethics and Biosafety of BGI (No. FT 18134).

AUTHOR CONTRIBUTIONS
FC, JLu, and XY contributed to the study design. YW, SF, JLi, WT, ML, LL, and HH contributed to the fish culture and sample preparation. XZ, YW, and XW performed the bioinformatics analysis. JLu, XY, XZ, and XW wrote the paper. All authors read and approved the final manuscript.