Chromosome-Level Genome Assembly of Cyrtotrachelus buqueti and Mining of Its Specific Genes

Background: The most severe insect damage to bamboo shoots is the bamboo-snout beetle (Cyrtotrachelus buqueti). Bamboo is a perennial plant that has significant economic value. C. buqueti also plays a vital role in the degradation of bamboo lignocellulose and causing damage. The genome sequencing and functional gene annotation of C. buqueti are of great significance to reveal the molecular mechanism of its efficient degradation of bamboo fiber and the development of the bamboo industry. Results: The size of C. buqueti genome was close to 600.92 Mb by building a one paired-end (PE) library and k-mer analysis. Then, we developed nine 20-kb SMRTbell libraries for genome sequencing and got a total of 51.12 Gb of the original PacBio sequel reads. Furthermore, after filtering with a coverage depth of 85.06×, clean reads with 48.71 Gb were obtained. The final size of C. buqueti genome is 633.85 Mb after being assembled and measured, and the contig N50 of C. buqueti genome is 27.93 Mb. The value of contig N50 shows that the assembly quality of C. buqueti genome exceeds that of most published insect genomes. The size of the gene sequence located on chromosomes reaches 630.86 Mb, accounting for 99.53% of the genome sequence. A 1,063 conserved genes were collected at this assembled genome, comprising 99.72% of the overall genes with 1,066 using the Benchmark Uniform Single-Copy Orthology (BUSCO). Moreover, 63.78% of the C. buqueti genome is repetitive, and 57.15% is redundant with long-term elements. A 12,569 protein-coding genes distributed on 12 chromosomes were acquired after function annotation, of which 96.18% were functional genes. The comparative genomic analysis results revealed that C. buqueti was similar to D. ponderosae. Moreover, the comparative analysis of specific genes in C. buqueti genome showed that it had 244 unique lignocellulose degradation genes and 240 genes related to energy production and conversion. At the same time, 73 P450 genes and 30 GST genes were identified, respectively, in the C. buqueti genome. Conclusion: The high-quality C. buqueti genome has been obtained in the present study. The assembly level of this insect’s genome is higher than that of other most reported insects’ genomes. The phylogenetic analysis of P450 and GST gene family showed that C. buqueti had a vital detoxification function to plant chemical components.


INTRODUCTION
Bamboo is of great value in terms of both qualities of life and economics. Moreover, it is widely used in construction, woodbased panel, paper, bamboo tourism, and agricultural industries (Scurlock et al., 2000). It grows fast and is the most closed growing plant in the world. The bamboo forest area of China accounts for almost one-quarter of the total bamboo forest area in the world (Littlewood et al., 2013). Bamboo shoots also have rich nutritional value and health functions (Chauhan et al., 2016). However, Cyrtotrachelus buqueti (NCBI:txid1892066) is an oligotrophic bamboo forest pest that feeds exclusively on bamboo shoots. Moreover, it can destructively damage bamboo shoots, resulting in the failure of bamboo shoots to grow into bamboo, and greatly reduce the output of bamboo forest Wang et al., 2005).
Cyrtotrachelus buqueti, a bamboo snout beetle, belongs to Curculionidae family of the Coleoptera. The larvae of C. buqueti grows in the soil during autumn and emerge as imagoes during the following summer (June and July). The adults of C. buqueti fly to caespitose bamboo shoot areas, feed on the bamboo shoots, and lay eggs in the bamboo shoot tips after mating Wang et al., 2005; Figure 1). These beetles are mainly distributed in Fujian, Guangdong, Jiangsu, Sichuan, Guizhou, Chongqing, Guangxi and other provinces in China and Vietnam, Laos, Myanmar, and Thailand (Wen and Lu, 2006;Xiao, 2009;Li, 2010). In Sichuan Province, C. buqueti is a severe pest in caespitose bamboo forests with a risk estimate of more than 40% in 67,000 hectares of bamboo (Yang et al., 2009;Luo et al., 2018b). However, only a few studies concerning C. buqueti have been reported to date, and studies of this pest have focused on its biological features and chemical regulation. Some experiments have, for instance, been dedicated to the reproductive behavior of insects  and pest management of C. buqueti . Yang et al. (2009) investigated the effect of larval density on the number and harm of wormholes, analyzed bamboo shoot volatiles, and explored insect EAG responses (Yang et al., 2010). The semiochemical cuticular components of C. buqueti adults have also been identified and extracted (Mang et al., 2012). Previous studies' results showed that C. buqueti played an essential role in the degradation of bamboo lignocellulose (Luo et al., 2018a(Luo et al., , 2019a. However, we don't know how many genes in C. buqueti genome are involved in the degradation of bamboo cellulose, how these genes are distributed, and which genes play an essential role in the degradation of bamboo lignocellulose. Interestingly, C. buqueti larvae only need to feed on bamboo shoots for about 20 day to complete their whole life activities. However, the molecular mechanism of energy conversion and energy storage required for a lifetime is still unclear. The beetles for which genomic sequencing has been completed include Dendroctonus ponderosae (Keeling et al., 2013), Tribolium castaneum (Tribolium Genome Sequencing Consortium et al., 2008), Anoplophora glabripennis (McKenna et al., 2016), Rhynchophorus ferrugineus (Hazzouri et al., 2020), and so on. These beetles are feeding plants and previous studies have reported specific genes for cellulose degrading in their genomes. To dissect the functional gene distribution of degrading bamboo fibers and their molecular evolutionary relationships in the genome of C. buqueti, a high-quality genome sequence was obtained by whole genome sequencing in this study.
"Based on our observation" in this study, we have achieved three goals. Firstly, the chromosome-level genome of C. buqueti has been assembled and mapped by Illumina sequencing, long-read Pacific Biosciences (PacBio) sequencing and high-throughput chromosome conformation capture (Hi-C) technology. Secondly, we've mined the genes Carbohydrate-Active enZymes (CAZy) involved in cellulose degradation in C. buqueti genome and other related species' genomes. Moreover, the similarities and differences of CAZy gene family between C. buqueti and other related species were compared. Finally, we have identified that the genes were related to energy production and conversion in C. buqueti genome, and analyzed the evolutionary relationship of P450 and GST gene family in C. buqueti genome. The availability of high-quality chromosomelevel genome sequence provides a theoretical basis for revealing bamboo lignocellulose degradation, energy conversion, energy storage and detoxification function of C. buqueti.

Insect Collection and Treatment
Male individuals from C. buqueti were collected from Muchuan (N103 • 98 , E28 • 96 ), Sichuan, in early August 2017. After emerging bamboo shoots for 3 days, gathered all the individuals. At Sichuan Province Key Lab for Bamboo Pest Control and Resource Development, fed bamboo shoots obtained from the bamboo garden on these male individuals of C. buqueti under the following conditions in a laboratory greenhouse: 26 • C, 70% humidity, and 16 h light/8 h dark. Used two male individuals fed for 1 week for genome and transcriptome sequencing, respectively.

Genome and Transcriptome Sequencing
According to the manufacturer, fresh mixed tissue samples were obtained from a male individual of C. buqueti refer to a Universal Genomic DNA Kit CW2298 (Cwbiotech Co., Ltd., Beijing, China)'s operating manual. Next, 1% agarose gel electrophoresis of the extracted genomic DNA was performed, after which the concentration was quantified using a Qubit 3 fluorometer (Thermo Fisher, Waltham, MA, United States). Until genome sequencing, long-read Pacific Biosciences (PacBio) genomic libraries (SMRTbell libraries) were built by shearing DNA into ∼20 kilobase (kb) fragments using a Covaris g-TUBE fragment (KBiosciences part No. 520079); they were then enzymatically repaired and 20-kb SMRTbell libraries were designed using a 1.0 DNA Template Prep Package (PacBio part No. 100-259-100). The size of the DNA fragments was subsequently measured using a Bioanalyzer 2100 12K DNA Chip assay (Agilent part No. 5067-1508), after which the templates were size-selected to enrich large DNA fragments (>10 kb) using BluePippin (Sage Science, Inc., Beverly, MA, United States). An Agilent Bioanalyzer 12 kb DNA Chip (Agilent Technologies, Santa Clara, CA, United States) and a Qubit fluorimeter (Invitrogen, Carlsbad, CA, United States) were used to examine the fragments' content and measure the library. Binding Kit 2.0  was used for the design of the SMRTT.
The complex of BellPolymerase as defined in the protocols. Genome sequencing was performed using a PacBio Sequel sequencer at Biomarker Technologies Corporation (Beijing, China) (Pacific Biosciences, Menlo Park, CA, United States). Samples were mounted and sequenced onto PacBio SMRT v3.0 cells (PacBio part No. 100-171-800) of the Sequel instrument, collected in one film at 360 min per SMRT cell. The total count of k-mers was 44976583431, and the peak k-mer depth was 64. The genome size of C. buqueti was calculated by dividing the total k-mer count by the peak depth, which was 600.92 Mb. The single peak of the k-mer distribution profile indicates that the C. buqueti genome has a low level of heterozygosity.
Finally, introduced MagBead loading (PacBio part No. 100-125-900) to improve the large fragments' enrichment. As a result, a total of nine SMRT cells were programmed, and 48.71 G subread sequences with an average length of 10.53 kb were obtained (Supplementary Table 1).
For short-read sequencing using the protocols offered (San Diego, CA, United States), a single paired-end (PE) library with insert sizes of 270 bp was designed and then sequenced on an Illumina HiSeq X Ten platform as instructed by the manufacturer. The findings showed that generated generated clean sequences in total 51.12 Gb of PE (2 × 150 bp) in total (Supplementary Table 2). Only a single peak was detected in the purified sequences' k-mer depth distribution (Figure 2), showing that the C. buqueti had low heterozygosity in the genome. Moreover, also used these Illumina sequencing data for genome size estimate, assembly adjustment, and assessment.
The total count of k-mers was 44976583431, and the peak k-mer depth was 64. The genome size of C. buqueti was calculated by dividing the total k-mer count by the peak depth, which was 600.92 Mb. The single peak of the k-mer distribution profile indicates that the C. buqueti genome has a low level of heterozygosity.
Muscle tissue from a male individual was ground in liquid nitrogen and RNA was extracted in RNA extraction reagent. RNA-seq library construction was performed using Illumina sequencing after its RNA met the requirements for transcriptome sequencing by agarose gel electrophoresis and measurement of nucleic acid instrument content. The original data were obtained by transcriptome sequencing. Clean reads obtained after quality evaluation, redundant sequences which were deleted were used for its genome sequence annotation, alignment and transcriptome analysis.

Genome Size Estimation
A PE library with 51.12 Gb clean reads from a 270 bp inserts used to estimate the C. buqueti genome size and heterozygosity using the 19-mer distribution method: G = kmer number/average k-mer depth, with the 4ˆk/genome > 200. Found the highest peak in the k-mer distribution curve at a k-mer depth of 64, and a total of 44976583431 k-mer were obtained, which resulted in 38786846512 k-mer after deleting the abnormal k-mer values (Supplementary Figure 1). According to the k-mer distribution, the C. buqueti genome size was 600.92 Mb with 52.27% repeat sequences and 0.20% heterozygosity, and the data were approximately 85 × the coverage of the genome.

High-Throughput Chromosome Conformation Capture Library Construction and Sequencing of the Cyrtotrachelus buqueti Genome
For Hi-C sequencing, chromatin isolation and library building from fresh tissues in C. buqueti was conducted using the Step Genomics animal Hi-C kit (Seattle, WA, United States). Streptavidin beads for library construction were identified as fragments containing contacts. High-throughput chromosome conformation capture libraries were sequenced at Biomarker Technologies Corporation (Beijing, China) using the NextSeq500 platform (Illumina). The  Hi-C raw reads were filtered using Trimmomatic (Bolger et al., 2014) to trim the adapter and low-quality sequences. BWA-aln (Li and Durbin, 2009) was used to align the Hi-C clean reads to the assembled contigs. A total of 55.87 Gb clean data with an 88.14 × coverage depth were obtained. Next, HiC-Pro (Burton et al., 2013) was used to screen and evaluate the Hi-C data divided into Valid Interaction Pairs and Invalid Interaction Pairs.
Hi-C clean reads were aligned into genome sequences using BWA (Li and Durbin, 2009) and clustered by Phase Genomics Hi-C using LACHESIS.

Manual Annotation of Specific Gene Families and Phylogenetic Analysis
Using reciprocal BLAST against NCBI nr and gene familyspecific datasets, the gene families of cytochromes P450, GSTs, and CAZymes (CAZy) were identified in C. buqueti genome. Each gene model was manually annotated, and then the non-redundant translated proteins were aligned with MUSCLE (Li et al., 2003) to the corresponding proteins from several other insect species for which genomes have been sequenced. A maximum-likelihood phylogeny was created with Mega XI (Guindon et al., 2009) and drawn with iTOL 2 (Ivica and Peer, 2021).

Genome Assembly and Annotation
A total of 630.86 Mb genomic sequences were located on chromosomes, accounting for 99.53% of the total sequence length, while the corresponding sequences were 105, accounting for 65.63% of the total sequences (Figure 3 and Supplementary  Table 3). At last, we assembled the genome sequences of C. buqueti; this genome had an overall length of 633.85 Mb and composed of 149 contigs with an N50 length of 27.93 Mb, respectively (Supplementary Table 4). As of August 31, 2021, there are currently 2,241 insect genome assembly versions, including different assemblies of the same insect. Of all the assembled versions, 296 were at the chromosome level, 1,060 were at the contig level, 885 were at the scaffold level. After eliminating different genome assembly versions of the same species and leaving the version with the highest assembly quality, 1,525 insect genomes have been published on NCBI (Supplementary Table 5). Moreover, the maximum value of scaffold N50 in all insect genome assembly is 25.7 Mb. This value is less than the contig N50 (27.93 Mb) of C. buqueti genome. Therefore, the assembly quality of C. buqueti genome is higher than that of most published insect genomes.
The annotation results of C. buqueti genome showed that there are 37,188 genes including protein-coding and nonprotein-coding genes. The total gene length of these genes Frontiers in Ecology and Evolution | www.frontiersin.org was 166,889,308 bp (Supplementary Table 11). The genome annotation results showed that there were 12,569 protein-coding genes in C. buqueti genome. Moreover, all 12,569 proteincoding genes were distributed on 12 chromosomes of C. buqueti (Table 1). A 5,912 protein-coding genes, 47.04% of all proteincoding genes in C. buqueti genome, are annotated by GO, 5,578 protein-coding genes, 44.38% of that, are annotated by KEGG, 8,388 protein-coding genes, 66.74% of that, are annotated by KOG, 9,793 protein-coding genes, 77.91% of that, are annotated by Pfam, 7,880 protein-coding genes, 62.69% of that, are annotated by Swissprot, 11,895 protein-coding genes, 94.64% of that, are annotated by TrEMBL, 11,901 protein-coding genes, 94.69% of that, are annotated by NR, 10,162 protein-coding genes, 80.85% of that, are annotated by Nt. In sum, 12,089 protein-coding genes are annotated by eight database, accounting for 96.18% of all protein-coding genes in C. buqueti genome ( Table 2). The results of pseudogene prediction showed that there were 1,621 pseudogenes in C. buqueti genome (Supplementary Table 12).

Comparative Genomic Analysis
A 109,057 genes were used for comparative genomic analysis, including 12,569 genes in C. buqueti genome, 12,314 genes in Z. nevadensis genome, 12,841 genes in T. castaneum genome, 11,373 genes in A. planipennis genome, 13,886 genes in A. glabripennis genome, 12,102 genes in D. ponderosae genome, 13,886 genes in D. melanogaster genome and 19,439 genes in A. melanoleuca genome. A total of 10,956 genes in C. buqueti genome were assigned to 9,394 gene families, with 112 unique families. These results also indicate that the number of specific gene family in C. buqueti genome is only 112 as well as that in A. planipennis, which was less than that in other related genomes ( Table 3). A maximum-likelihood phylogenetic tree constructed using PhyML (PhyML, RRID:SCR 014629) (Simao et al., 2015) showed that C. buqueti was closely related to D. ponderosae (Figure 3). The CodeML (Schabauer et al., 2012) module in PAML and the Branch Site model were used to analyze the selection pressure of single-copy genes. The rapid evolution single-copy genes were annotated by alignment to the KEGG and GO databases (Figure 4).

Identification and Comparative Analysis of Specific Gene Families
To complete its whole life cycle, C. buqueti a larval only has to eat on bamboo shoots for around 20 days. Furthermore, previous research has revealed that it has a high capacity to breakdown bamboo cellulose (Luo et al., 2018a(Luo et al., , 2019a. However, it is unknown which gene family in the C. buqueti genome is engaged in bamboo cellulose degradation, how it is distributed, or which genes play a key role in bamboo cellulose breakdown. Therefore, its unidentified all the genes in C. buqueti genome that may be involved in bamboo cellulose degradation, but also identified those genes in other related species' genomes. The identification results showed that the number of the CAZy genes in C. buqueti genome was only 244, account for 1.94% of total genes, which was less than that of most related species, just a little more than that of Zootermopsis nevadensis genome (236). In related species' genomes, the number of CAZy gene family in Ailuropoda melanoleuca genome was largest, which was 701. It has been reported that Z. nevadensis has strong lignocellulose degrading enzyme activity (Pester and Brune, 2007). Moreover, A. melanoleuca is a mammal with the same food as C. buqueti as the only food. However, it has been reported that the degradation ability of A. melanoleuca to lignocellulose is not as strong as that of Z. nevadensis (Jin et al., 2021). In brief, it can be inferred that the activity of lignocellulose degrading enzyme may be inversely proportional to the number of CAZy genes in vivo. At the same time, we found that the number of AA, CBM, and CE genes in CAZy gene family in C. buqueti and Z. nevadensis genome was higher than that in A. melanoleuca genome. This finding confirms that C. buqueti has stronger ability to degrade lignocellulose than A. melanoleuca and weaker ability to degrade lignocellulose than Z. nevadensis. Moreover, this finding suggests that some members of AA, CBM, and CE gene family may play a key role in lignocellulose degradation (Table 4). Further research of which members of CAZy gene family in C. buqueti genome play a key role in bamboo cellulose degradation was needed.
Although we have identified 244 CAZy gene family genes in the C. buqueti genome involved in the rapid degradation of bamboo cellulose, which is the energy required for the rapid transformation of its life activities, its energy conversion and storage mechanism still unclear, we analyzed the genes related to energy production and conversion, energy storage in the C. buqueti genome to solve this problem. These results showed 240 genes related to energy production and transformation in C. buqueti genome, which including mitochondrial carrier protein, ATP synthase, Adenylate and so on. There are 11 genes related to fatty acid metabolism, these genes may be related to their energy storage ( Table 5). Although 240 genes related to energy production and conversion have been found, it is not clear how these genes participate in the rapid energy conversion and storage of C. buqueti. Therefore, these mechanisms need to be further studied.
P450 cytochromes and glutathione S-transferases (GSTs) are commonly involved in detoxifying plant chemicals. Some members are likely to be involved in the sequential pathway of metabolizing xenobiotics by making them more polar and excretable. To reveal how C. buqueti detoxifies plant chemical components, we identified the detoxification enzyme gene, the P450 cytochromes and the glutathione S-transferases (GSTs) in its genome. At the same time, we analyzed the number and proportion of detoxification enzyme genes in its genome and other beetles. These results showed 73 P450 genes and 30 GSTs genes, accounting for 0.58 and 0.24% of the total genes, respectively, in the C. buqueti genome. Compared with other beetle genomes, the number of P450 genes in its genome is medium. However, the number of GSTs genes in its genome is relatively large, only less than T. castaneum and A. glabripennis (Table 6).
Tribolium castaneum is the first Coleoptera insect which be completed genome sequencing and annotation. In this study, we analyzed the evolutionary relationship of P450 and GST gene families in C. buqueti and T. castaneum genomes. All P450 family protein members were divided into 12 groups in phylogenetic tree of P450 proteins in C. buqueti and red flour beetle genome. These groups were named Group I to Group XII. In this phylogenetic tree, Group I was the fastest evolving group, and Group XII was the slowest and most conservative group. A 73 and 156 members of the P450 gene family were, respectively, found in the phylogenetic tree of C. buqueti and T. castaneum. Moreover, most P450 gene family members of C. buqueti were mainly distributed in Group II, Group VII and Group XI, most of those in T. castaneum were mainly distributed in Group I and Group IV. These results showed that most of the P450 gene family members of T. castaneum evolved faster than those of C. buqueti. Most members of the P450 gene family of C. buqueti were more conservative and probably play a more important role in detoxifying phytochemicals (Figure 5).
Glutathione S-transferase (GST) is the key enzyme in glutathione binding reaction, it exists in many forms. In the phylogenetic tree of the GSTs gene family in C. buqueti and T. castaneum genome, all the GSTs protein members were divided into 10 groups, Group I to Group X, respectively. Group I was the fastest evolving group, and Group X was the slowest and most conservative group. Interestingly, we found that most of the GST gene family members of C. buqueti were distributed in Group I, while most of the GST memory family members of T. castaneum were distributed in Group IX and Group X. These findings indicate that the evolutionary rate of GST gene family members of C. buqueti is significantly higher than that of T. castaneum (Figure 6).

CONCLUSION
We have assembled a high-quality genome of C. buqueti, which is the pest that does the most bamboo damage. The assembly quality of this insect's genome sequence was better than those of other published insects' genome. In this study, we identified that 244 genes were related to bamboo cellulose degradation in C. buqueti genome. At the same time, we found that there were specific genes related to lignocellulose degradation in a genome, such as some members of AA, CBM, and CE gene families. Moreover, 240 genes related to energy production and conversion were identified in C. buqueti genome. Finally, we compared and analyzed the evolutionary relationship of the P450 and GST gene family between C. buqueti and Coleoptera model insect T. castaneum genome. The evolutionary relationship between the two gene family members showed that the detoxification function of C. buqueti was possible stronger than that of T. castaneum. Genome assembly at a chromosome level of C. buqueti, identification and comparative analysis of its specific genes laid a theoretical foundation for revealing the molecular mechanism of its bamboo degradation, energy conversion, storage and detoxification function.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI (accessions: PRJNA675312 and PRJNA718062).

AUTHOR CONTRIBUTIONS
CF, WL, CL, YL, YC, XX, HL, JY, SC, XN, SB, and YY collected insect samples, extracted DNA/RNA, and performed transcriptome sequencing and gene expression analyses. CF, CL, WL, YC, XX, HL, JY, and SC performed DNA sequencing, genome assembly, gene annotation, evolution and comparative genomic analyses. CF, WL, CL, SB, and YY wrote and revised the manuscript. CF, CL, WL, XN, SB, and YY conceived strategies, designed experiments, and managed projects. All authors read and approved the manuscript.

FUNDING
This work was supported by grants from the National Natural Science Foundation of China (31470655) and Leshan Normal University's Science and Technology Program (XJR17005, LZD010).