A high-quality genome assembly and annotation of Quercus acutissima Carruth

Liu, Dan; Xie, Xiaoman; Tong, Boqiang; Zhou, Chengcheng; Qu, Kai; Guo, Haili; Zhao, Zhiheng; El-Kassaby, Yousry A.; Li, Wei; Li, Wenqing

doi:10.3389/fpls.2022.1068802

ORIGINAL RESEARCH article

Front. Plant Sci., 24 November 2022

Sec. Plant Bioinformatics

Volume 13 - 2022 | https://doi.org/10.3389/fpls.2022.1068802

A high-quality genome assembly and annotation of Quercus acutissima Carruth

DL
Dan Liu ^1,2
XX
Xiaoman Xie ²
BT
Boqiang Tong ²
CZ
Chengcheng Zhou ¹
KQ
Kai Qu ¹
HG
Haili Guo ²
ZZ
Zhiheng Zhao ¹
YA
Yousry A. El-Kassaby ³
WL
Wei Li ¹^*
WL
Wenqing Li ²^*

1. National Engineering Research Center of Tree Breeding and Ecological Restoration, State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
2. Shandong Provincial Center of Forest and Grass Germplasm Resources, Jinan, China
3. Department of Forest and Conservation Sciences, The University of British Columbia, Vancouver, BC, Canada

Article metrics

View details

Citations

3,8k

Views

1,3k

Downloads

Abstract

Introduction:

Quercus acutissima is an economic and ecological tree species often used for afforestation of arid and semi-arid lands and is considered as an excellent tree for soil and water conservation.

Methods:

Here, we combined PacBio long reads, Hi-C, and Illumina short reads to assemble Q. acutissima genome.

Results:

We generated a 957.1 Mb genome with a contig N50 of 1.2 Mb and scaffold N50 of 77.0 Mb. The repetitive sequences constituted 55.63% of the genome, among which long terminal repeats were the majority and accounted for 23.07% of the genome. Ab initio, homology-based and RNA sequence-based gene prediction identified 29,889 protein-coding genes, of which 82.6% could be functionally annotated. Phylogenetic analysis showed that Q. acutissima and Q. variabilis were differentiated around 3.6 million years ago, and showed no evidence of species-specific whole genome duplication.

Conclusion:

The assembled and annotated high-quality Q. acutissima genome not only promises to accelerate the species molecular biology studies and breeding, but also promotes genome level evolutionary studies.

Introduction

As one of the largest genera in Fagaceae, Quercus (oak) contains more than 400 widely distributed species in Asia, Europe, Africa, and North America (Simeone et al., 2016). Oaks have various utilities, including timber, bioenergy, and dyes production (Sasaki et al., 2014; Wu et al., 2014; Li et al., 2018). According to molecular classification, the genera Quercus has been divided into two subgenera, Quercus and Cerris (Denk and Grimm, 2010; Denk et al., 2017; Deng et al., 2018; Hipp et al., 2018). The subgenera Quercus includes five groups (sections): Ponticae, Virentes, Protobalanus (intermediate Oak), Quercus (white oak), and Lobatae (red oak), while Cerris includes three groups (sections): Ilex, Cerris and Cyclobalanopsis (Denk et al., 2017). Within the Quercus genera, the evolutionary profiles of plastid genomes have been elucidated in Q. acutissima, Q. aliena, Q. aquifolioides, Q. baronii, Q. dolicholepis, Q. edithiae, Q. fabri, Q. glauca, and 10 other Quercus plastomes (Li et al., 2021). However, only four species with whole genome sequences have been published, including Q. lobata (Sork et al., 2016), Q. suber (Ramos et al., 2018), Q. robur (Plomion et al., 2016), and Q. acutissima (Fu et al., 2022). Although the genome data of Q. acutissima have been published, the continuity of the assembly still needs improvement (Fu et al., 2022).

As an important ecological and economic tree species, Q. acutissima Carruth is widely distributed in East Asia, especially in southeast China (18° - 41° N, 91° - 123°E) (Li et al., 2018; Yang et al., 2019). The silvics of Q. acutissima is usually mixed or secondary monocultures, which are also distributed in a scattered manner in harsh environments (Aldrich et al., 2003; Zhang et al., 2013). Q. acutissima timber provides excellent building material and charcoal production in many Asian countries, including China, Japan, and Korea (Zhang et al., 2013). At present, research on Q. acutissima is mainly focused on propagation, eco-physiology, selection, and genetic diversity (Dong, 2008; Wang et al., 2009; Liao, 2012; Zhang et al., 2013). In northern China, Q. acutissima forest ecosystems have been degraded due to human disturbance, threatening the species genetic resources (Aldrich et al., 2003; Zhang et al., 2013). Thus, planning breeding and conservation programs for Q. acutissima native populations is crucial, and the understanding of the species genome-wide evolution, gene function, and molecular breeding are important elements to supporting these goal (Greene and Morris, 2001).

Here, the Q. acutissima genome was sequenced and de novo assembled using PacBio long reads, Hi-C reads, and Illumina short reads. We performed structural gene annotation, repetitive sequences identification, and executed comparative genomics with other plant genomes. Our results are expected to improve our understanding of the evolution and diversification of genes in Q. acutissima, laying the foundation for novel genes discovery and ultimately contributing to the development of novel properties for the species breeding programs.

Materials and methods

Plant materials, DNA extraction and genome sequencing

Fresh Q. acutissima leaves were collected from a tree growing in the Shandong Provincial Center of Forest and Grass Germplasm Resources (36.62°N, 117.16°E), immediately frozen in liquid nitrogen, and stored at -80°C until further use. Plant specimens (barcode number SDF1001228) and total genomic DNA (code ld001qa001) were stored in Shandong Provincial Center of Forest and Grass Germplasm Resources. Total genomic DNA was extracted from leaf tissue using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) following the manufacturer’s instructions. After obtaining high-quality purified genomic DNA samples, PCR free SMRT bell library was constructed and sequenced by PacBio sequencing platform, and we obtained 154.41 Gb of subreads with 160× coverage. We also constructed a Hi-C library and a paired-end library with an insert size of 350 bp and sequenced using the Illumina HiSeq X Ten platform.

Genome assembly, quality evaluation, and construction of pseudomolecule chromosomes

Before Q. acutissima genome de novo assembly, we used high-quality Illumina paired-end reads to estimate the genome size and heterozygosity with genomescope software (Vurture et al., 2017). Four software, including Canu (v2.1.1, default parameters) (Koren et al., 2017), FALCON (Chin et al., 2016), SmartDenovo (Istace et al., 2017), and WTDBG (Ruan and Li, 2019) were used to perform preliminary assembly of the genome. After the assembly of the third generation subreads, due to the presence of sequencing errors, a certain amount of error information existed such as short insertion-deletion mutations (Indel) and single-nucleotide polymorphism (SNP). Thus, we used the Illumina sort reads to polish this genome with BWA (v0.7.9a, parameter, -k 30) (Li and Durbin, 2009), and Pilon software (v1.22, default parameters) (Walker et al., 2014). Additionally, based on the OrthoDB (Kriventseva et al., 2019) database, we performed a BUSCO (version 3.0.1, default parameters) (Simão et al., 2015) assessment using single-copy orthologous genes to confirm the genome assembly quality. Quality control of the alignment reads was performed using the Phase Genomics Hi-C alignment quality control tool and scaffolding was carried out with Phase Genomics Proximo Hi-C genome scaffolding platform to obtain chromosome-level assembly.

Genome annotation

We used a combination of de novo prediction and homology-based searches to annotate the genome tandem and interspersed repeats. First, RepeatModeler software (Flynn et al., 2020) was used to build the de novo repeat sequence library, and then we used RepeatMasker (Tarailo-Graovac and Chen, 2009), and Tandem Repeat Finder (Gary, 1999) software for repeat sequences prediction. Second, based on Repbase (Jurka et al., 2005), we used RepeatMasker to search homologous repeat sequences.

After repetitive sequence masking, we used three methods to predict gene structure. First, homology prediction was conducted by comparing homologous proteins from plant genomes, including Q. lobata (Sork et al., 2016), Q. suber (Ramos et al., 2018), Q. robur (Plomion et al., 2016), Fagus sylvatica (Mishra et al., 2018), and Casuarina equisetifolia (Ye et al., 2018) using Blast v2.2.28 and the GeneWise web resource v2.2.0 (Birney et al., 2004). Second, we used Augustus (Stanke et al., 2004), SNAP (https://github.com/KorfLab/SNAP), and GeneMark (Ter-Hovhannisyan et al., 2008) to ab initio gene prediction. Third, the PASA software (Roberts et al., 2011) was used to predict gene structure by aligning EST/cDNA sequences with the genome. Combining the above results, using the evincemodeler (EVM) (Haas et al., 2008) to integrate the gene set predicted by the three strategies into a nonredundant and more complete gene set.

We used the NCBI protein database, GO (Mi et al., 2019), KEGG (release 84.0) (Kanehisa et al., 2016), NR (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz), PFAM (Finn et al., 2014), and eggNOG-mapper (Cantalapiedra et al., 2021) to annotate gene function. The E-value cutoff was set to 1e-5 for BLAST searches.

Gene families and phylogenetic analysis

We downloaded (https://www.ncbi.nlm.nih.gov/) and performed a comparative genomic investigation of Q. acutissima with Q. robur, Q. mongolica, Q. lobata, Q. variabilis, Q. suber, Castanea mollissima, Castanea crenata, Castanopsis tibetana, Fagus sylvatica, Juglans regia, Cyclocarya paliurus, Carya illinoinensis, Morella rubra, Corylus mandshurica, Carpinus viminea, Betula pendula, and Vitis vinifera. The software OrthoFinder2 v2.3.1 (Emms and Kelly, 2019) was used to identify homoeologous gene clusters. IQ-TREE v1.6.7 (Nguyen et al., 2015) was used to construct a phylogenetic tree based on single copy homoeologous genes. The MAFFT v7.4.07 (Katoh and Standley, 2013) was used to align homoeologs before transforming aligned protein sequences into codon alignment. The concatenated amino acid sequences were trimmed using trimAL v1.4 (Capella-Gutiérrez et al., 2009) with -gt 0.8 -st 0.001 -cons 60. Divergence times were estimated using the MCMCTree software (Yang, 2007) in the PAML v4.9h (Guindon et al., 2010) package with the BRMC method (Sanderson, 2003; Blanc and Wolfe, 2004), and the correction times were taken from the TimeTree (Kumar et al., 2017): 109.0-123.5 MYA split time between V. vinifera and B. pendula, 56.8-95.0 MYA split time between Q. suber and B. pendula, and 35.7-83.5 MYA split time between J. regia and B. pendula. Based on the clustering analysis of gene families and dating, gene family expansion and contraction analyses were performed using CAFÉ (De Bie et al., 2006).

Synteny and WGD analysis

Syntenic blocks containing at least five genes were identified using the python version of MCScan (Huang et al., 2009; Schmutz et al., 2010) between Q. mongolica, Q. variabilis, Q. acutissima, C. mollissima, and C. tibetana. Genome circular plot was produced using Circos (Krzywinski et al., 2009). KaKs_Calculator 2.0 (Wang et al., 2010) was used to calculate Ka, Ks, and the Ka/Ks ratio by implementing the YN model.

GO enrichment analysis

GO enrichment analysis was performed using the R package clusterProfiler (Yu et al., 2012). The p values were adjusted for multiple comparisons using the method of Benjamini and Hochberg (p < 0.05 was considered significant).

Results

Genome sequencing and assembly

We sequenced Q. acutissima genome and generated a total of 154.41 Gb PacBio long reads with N50 of 24,256 bp (Table S1). The genome size and heterozygosity were estimated to be 750 Mb and 2.77% using K-mer analysis, respectively (Figure S1). To accurately assemble the Q. acutissima genome, we compared multiple assembly strategies in the primary step, and based on contiguity metrics including the total number of assembled contigs, N50, contigs’ maximum length, and the best assembly from Canu was selected for further polishing and scaffolding with Hi-C data. The assembled genome size was 957.09 Mb, including 1,507 contigs with an N50 length of 1.20 Mb and 15 scaffolds with N50 length 77.04 Mb (Table 1). The longest 12 scaffolds correspond to 12 pseudo-chromosomes (Figure 1).

Table 1

Assembly features
Number of contigs	1,507
Contig N50 (Mb)	1.20
Number of scaffolds	15
Scaffold N50 (Mb)	77.04
Number of genes	29,889
Average gene length (bp)	4,476.10
Average exons per gene	4.92
Average exon length (bp)	253.60
Average intron length (bp)	824.50
Average Coding sequences length(bp)	1,247.79
Total size of repeat sequences (Mb)	532.33

Quercus acutissima genome assembly statistics.

Figure 1

Assessment of genomic integrity

The completeness and accuracy of the genome assembly were evaluated using BUSCO. The high BUSCO complete ratio (98.00%) corroborated the genome assembly excellent quality (Table S2). The guanine-cytosine (GC) depth analysis showed that there was no obvious left-right chunking in the GC-depth plot (Figure S2) and the average GC content was 35.18% (Table S3). Approximately 99.84% of the Illumina short reads could be successfully mapped to the genome assembly (Figure S3, Table S4). These results suggest that the assembly of the Q. acutissima genome is highly accurate and continuous.

Genome annotation

Through an integrative approach, we identified 546.67 Mb repetitive sequences, accounting for 57.13% of genome (Table 1, Table S5). The Long terminal repeat retrotransposons (LTR-RTs) from the largest proportion (23.07%) of the repeat (Table S6).

A total of 29,889 protein-coding genes were identified, their average lengths and coding sequences were 4,476.10 and 1,247.79 bp, respectively (Table 1). Based on the comparison between predicted gene sets with the annotation databases, a total of 24,689 (82.6%) genes were functionally annotated (Table S7).

Gene family and phylogenetic relationships

To assess the palaeohistory of Q. acutissima, we performed comparative genomic analyses incorporating Q. acutissima along with 16 other genomes and one outgroup (V. vinifera) (Figure 2). Out of the 28,312 gene families, only 10 were found to be unique to the Q. acutissima genome, and fewer than 60 gene families were unique to other Quercus (Table S8). Construction of the phylogenetic tree confirmed the evolutionary relationship within Quercus, and the divergence between Q. variabilis and Q. acutissima was estimated at 3.6 MYA (Figure 2). Expanded gene families provide the raw material for adaptation and trait evolution. We then examined the rates and direction of change in gene family size among taxa using CAFE (Han et al., 2013). The results showed that Q. acutissima exhibited larger numbers of contracted gene families (2,390) than expanded (3,897) (Table S9, Figure 2). These expanded families are mainly related to ion transport, such as ion transport, ion transmembrane transport, inorganic ion transmembrane transport (Table S10), while the contracted gene families were mainly enriched to glycosinolate biosynthetic process, sesquiterpene metabolic and biosynthetic process, monoterpenoid metabolic and biosynthetic process (Table S11).

Figure 2

Whole-genome duplication and synteny analysis

Whole genome duplication (WGD) events are widespread and play a vital role in plant genome adaptation and evolution (Xue et al., 2020), and are an important source of gene family expansion. After multiple sequence alignment of sequences in synteny blocks within Q. acutissima and other species, the synteny analysis showed that Q. acutissima had a 1:1 syntenic relationship with other Fagaceae, and there was little rearrangement of chromosomes, which indicated that the evolution of Fagaceae was very conserved and no independent WGD events occurred in Q. acutissima (Figure 3, Figures S4-S11).

Figure 3

Discussion

Q. acutissima, Fagaceae, is an economically and ecologically important tree species with wide distribution in China (Li et al., 2018; Zhang et al., 2020). Here, we generated a Q. acutissima genome at the chromosome-level. The assembled genome size is approximately 956.9 Mb, which is larger than the genome we assessed using the K-mer method, this may be due to the presence of chimerism in our assembly. The development of PacBio sequencing has resulted in a considerable increase in contig N50 sizes compared to previous sequencing technologies (Wei et al., 2020). The assemble length of contig N50 sizes can represent the genome assembling quality (Yang et al., 2021), consequently, our genome has high assembly contiguity. High heterozygosity and repetition rates are responsible for the inability to assemble high-quality genomes (Gao et al., 2020; Wang et al., 2021). Q. acutissima heterozygous rate was 2.77%, which is higher than that of Q. lobata (1.25%) (Sork et al., 2016) and Q. suber (1.62%) (Ramos et al., 2018). It is worth noting that 98% of complete BUSCO core genes were detected in the assembled genome, which is higher than that of Q. lobata (94%) (Sork et al., 2016) and comparable to Q. suber genome (97%) (Ramos et al., 2018). In summary, Q. acutissima assembly is relatively accurate and complete, which will provide a valuable genome resource for understanding the species evolution and enhance its genetic improvement.

The genus Quercus (Fagaceae), which includes 400-500 species, is distributed in Asia, Africa, Europe, and North America (Simeone et al., 2016; Bent, 2020). As a member in this genus, Q. acutissima genome information can fill genome research gap and promote the species evolutionary biology research. Following the statistical analysis of repeat in the genome, we found that the repeat regions accounted for 57.13%, the numbers of repetitive and ncRNA sequences were relatively high in Q. acutissima compared with other Quercus species. To understand the evolutionary development of Q. acutissima, we analyzed its evolution and divergence times. The syntenic analysis indicated that Q. acutissima did not experienced a recent WGD event. In plants, WGD events can lead to genome size variation, gene family expansion, chromosomal rearrangement, and species evolution (El Baidouri and Panaud, 2013; Wang et al., 2021). We found high collinearity relationship between Q. acutissima and Q. variabilis chromosomes, suggesting the conservative nature of their karyotypes.

In summary, we obtained high-quality Q. acutissima genome sequences using Pacbio, Hi-C and Illumina reads. The development of sequencing technologies, analytical methods, and statistical algorithms continue to promote the efficiency and accuracy of genome sequencing and assembly (Xue et al., 2020; Wei et al., 2020; Wu et al., 2020). Q. acutissima genome includes high quality chromosomal-level assembly and many important genes, offering novel insights into genome evolution, functional innovation, and key regulatory pathways in wood formation and production of high-value metabolites, and providing excellent genetic resources for comparative genome studies among Quercus species.

Funding

This research was funded by the Collection and arrangement of genetic resources and genetic diversity evaluation of Quercus acutissima of Biosafety and Genetic Resources Management Project of State Forestry and Grassland Administration, grant number KJZXSA202111; ‘Collection, Conservation, and Accurate Identification of Forest Tree Germplasm Resources’ of Shandong Provincial Agricultural Elite Varieties Project, grant number 2019LZGC018; Project of National Forest Germplasm Resources Sharing Service Platform Construction and Operation, grant number 2005-DKA21003.

Acknowledgments

We are grateful for the generous grant from the National Engineering Research Center of Tree Breeding and Ecological Restoration and Shandong Provincial Center of Forest and Grass Germplasm Resources that made this work possible.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Statements

Data availability statement

The data presented in the study are deposited in the CNGB Sequence Archive (CNSA, https://db.cngb.org/cnsa/) of China National GeneBank DataBase (CNGBdb) repository, accession number CNP0003530, CNP0002992.

Author contributions

WeiL and WenL designed and supervised the study. DL, XX, BT, CZ, and KQ collected the samples and extracted the genomic DNA and RNA. DL, CZ, KQ, HG and ZZ performed genome assembly and bioinformatics analysis. YE did English editing and retouching. DL wrote the original manuscript. WeiL and WenL reviewed and edited this manuscript. All authors read and approved the final manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2022.1068802/full#supplementary-material

References

1
AldrichP. R.ParkerG. R.WardJ. S.MichlerC. H. (2003). Spatial dispersion of trees in an old-growth temperate hardwood forest over 60 years of succession. For. Ecol. Manage.180, 475–491. doi: 10.1016/s0378-1127(02)00612-6
- CrossRef
- Google Scholar
2
BentJ. S. (2020). Quercus: classification ecology and uses (America: Nova Science Publishers Inc).
- Google Scholar
3
BirneyE.ClampM.DurbinR. (2004). GeneWise and genomewise. Genome Res.14, 988–995. doi: 10.1101/gr.1865504
- CrossRef
- Google Scholar
4
BlancG.WolfeK. H. (2004). Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell16, 1667–1678. doi: 10.1105/tpc.021345
- CrossRef
- Google Scholar
5
CantalapiedraC. P.Hernández-PlazaA.LetunicI.BorkP.Huerta-CepasJ. (2021). eggNOG-mapper v2: functional annotation orthology assignments and domain prediction at the metagenomic scale. Mol. Biol. Evol.38, msab293. doi: 10.1093/molbev/msab293
- CrossRef
- Google Scholar
6
Capella-GutiérrezS.Silla-MartínezJ. M.GabaldónT. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics25, 1972–1973. doi: 10.1093/bioinformatics/btp348
- CrossRef
- Google Scholar
7
ChinC. S.PelusoP.SedlazeckF. J.NattestadM.ConcepcionG. T.ClumA.et al. (2016). Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods13, 1050. doi: 10.1038/nmeth.4035
- CrossRef
- Google Scholar
8
De BieT.CristianiniN.DemuthJ. P.HahnM. W. (2006). CAFE: A computational tool for the study of gene family evolution. Bioinformatics22, 1269–1271. doi: 10.1093/bioinformatics/btl097
- CrossRef
- Google Scholar
9
DengM.JiangX. L.HippA. L.ManosP. S.HahnM. (2018). Phylogeny and biogeography of East Asian evergreen oaks (Quercus section cyclobalanopsis fagaceae): Insights into the Cenozoic history of evergreen broad-leaved forests in subtropical Asia. Mol. Phylogenet Evol.119, 170–181. doi: 10.1016/j.ympev.2017.11.003
- CrossRef
- Google Scholar
10
DenkT.GrimmG. W. (2010). The oaks of western Eurasia: traditional classifications and evidence from two nuclear markers. Taxon59, 351–366. doi: 10.1002/tax.592002
- CrossRef
- Google Scholar
11
DenkT.GrimmG. W.ManosP. S.MinD.HippA. L. (2017). An updated infrageneric classification of the oaks: review of previous taxonomic schemes and synthesis of evolutionary patterns (Germany: Springer International Publishing), 13–38. doi: 10.1007/978-3-319-69099-5_2
- CrossRef
- Google Scholar
12
DongY. (2008). Study on variation among quercus acutissima population and selection of its families and clones (China: Shandong Agricultural University).
- Google Scholar
13
El BaidouriM.PanaudO. (2013). Comparative genomic paleontology across plant kingdom reveals the dynamics of TE-driven genome evolution. Genome Biol. Evol.5, 954–965. doi: 10.1093/gbe/evt025
- CrossRef
- Google Scholar
14
EmmsD. M.KellyS. (2019). OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol.20, 1–14. doi: 10.1186/s13059-019-1832-y
- CrossRef
- Google Scholar
15
FinnR. D.BatemanA.ClementsJ.CoggillP.EberhardtR. Y.EddyS. R.et al. (2014). Pfam: the protein families database. Nucleic Acids Res.42, D222–D230. doi: 10.1093/nar/gkh121
- CrossRef
- Google Scholar
16
FlynnJ. M.HubleyR.GoubertC.GoubertC.RosenJ.ClarkA. G.et al. (2020). RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci.117, 9451–9457. doi: 10.1073/pnas.1921046117
- CrossRef
- Google Scholar
17
FuR.ZhuY.LiuY.FengY.LuR. S.LiY.et al. (2022). Genome-wide analyses of introgression between two sympatric Asian oak species. Nat. Ecol. Evol.6, 924–935. doi: 10.1038/s41559-022-01754-7
- CrossRef
- Google Scholar
18
GaoS.WangB.XieS.XuX.ZhangJ.PeiL.et al. (2020). A high-quality reference genome of wild cannabis sativa. Hortic. Res.7, 73. doi: 10.1038/s41438-020-0295-3
- CrossRef
- Google Scholar
19
GaryB. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res.27, 573–580. doi: 10.1093/nar/27.2.573
- CrossRef
- Google Scholar
20
GreeneS. L.MorrisJ. B. (2001). The case for multiple-use plant germplasm collections and a strategy for implementation. Crop Sci.41, 886–892. doi: 10.2135/cropsci2001.413886x
- CrossRef
- Google Scholar
21
GuindonS.DufayardJ. F.LefortV.AnisimovaM.HordijkW.GascuelO. (2010). New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol.3, 307–321. doi: 10.1093/sysbio/syq010
- CrossRef
- Google Scholar
22
HaasB. J.SalzbergS. L.ZhuW.PerteaM.AllenJ. E.OrvisJ.et al. (2008). Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignmentsGenome Biol Vol. 9, R7. doi: 10.1186/gb-2008-9-1-r7
- CrossRef
- Google Scholar
23
HanM. V.ThomasG. W. C.LugomartinezJ.HahnM. W. (2013). Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol.30, 1987–1997. doi: 10.1093/molbev/mst100
- CrossRef
- Google Scholar
24
HippA. L.ManosP. S.GonzálezR. A.HahnM.KaprothM.McVayJ. D.et al. (2018). Sympatric parallel diversification of major oak clades in the americas and the origins of Mexican species diversity. New Phytol.217, 439–452. doi: 10.1111/nph.14773
- CrossRef
- Google Scholar
25
HuangS.LiR.ZhangZ.LiL.GuX.FanW.et al. (2009). The genome of the cucumber cucumis sativus l. Nat. Gentic.41, 1275. doi: 10.1038/ng.475
- CrossRef
- Google Scholar
26
IstaceB.FriedrichA.AgataL.FayeS.PayenE.BelucheO.et al. (2017). De novo assembly and population genomic survey of natural yeast isolates with the Oxford nanopore MinION sequencer. Gigascience6, 1–13. doi: 10.1093/gigascience/giw018
- CrossRef
- Google Scholar
27
JurkaJ.KapitonovV. V.PavlicekA.KlonowskiP.KohanyO.WalichiewiczJ. (2005). Repbase update a database of eukaryotic repetitive elements. Cytogenet. Genome Res.110, 462–467. doi: 10.1159/000084979
- CrossRef
- Google Scholar
28
KanehisaM.SatoY.KawashimaM.FurumichiM.TanabeM. (2016). KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res.44, D457–D462. doi: 10.1093/nar/gkv1070
- CrossRef
- Google Scholar
29
KatohK.StandleyD. M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol.30, 772–780. doi: 10.1093/molbev/mst010
- CrossRef
- Google Scholar
30
KorenS.WalenzB. P.BerlinK.MillerJ. R.BergmanN. H.PhillippyA. M. (2017). Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res.27, 722–736. doi: 10.1101/gr.215087.116
- CrossRef
- Google Scholar
31
KriventsevaE. V.KuznetsovD.TegenfeldtF.ManniM.DiasR.SimãoF. A.et al. (2019). OrthoDB v10: sampling the diversity of animal plant fungal protist bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res.47, D807–D811. doi: 10.1093/nar/gky1053
- CrossRef
- Google Scholar
32
KrzywinskiM.ScheinJ.BirolI.ConnorsJ.GascoyneR.HorsmanD.et al. (2009). Circos: an information aesthetic for comparative genomics. Genome Res.19, 1639–1645. doi: 10.1101/gr.092759.109
- CrossRef
- Google Scholar
33
KumarS.StecherG.SuleskiM.HedgesS. B. (2017). TimeTree: a resource for timelines timetrees and divergence times. Mol. Biol. Evol.34, 1812–1819. doi: 10.1093/molbev/msx116
- CrossRef
- Google Scholar
34
LiaoJ. (2012). Somatic embryogenesis and rapid propagation technology of quercus acutissima Carr (China: Nanjing Forestry University).
- Google Scholar
35
LiH.DurbinR. (2009). Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics25, 1754–1760. doi: 10.1093/bioinformatics/btp324
- CrossRef
- Google Scholar
36
LiX.LiY.SylvesterS. P.ZangM.El-KassabyY. A.FangY. (2021). Evolutionary patterns of nucleotide substitution rates in plastid genomes of quercus. Ecol. Evol.11, 13401–13414. doi: 10.1002/ece3.8063
- CrossRef
- Google Scholar
37
LiX.LiY.ZangM.LiM.FangY. (2018). Complete chloroplast genome sequence and phylogenetic analysis of quercus acutissima. Int. J. Mol. Sci.19, 2443. doi: 10.3390/ijms19082443
- CrossRef
- Google Scholar
38
MiH.HuangX.MuruganujanA.EbertD.HuangX.ThomasP. D. (2019). PANTHER version 14: More genomes a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res.47, D419–D426. doi: 10.1093/nar/gky1038
- CrossRef
- Google Scholar
39
MishraB.GuptaD. K.PfenningerM.HicklerT.LangerE.NamB.et al. (2018). A reference genome of the European beech (Fagus sylvatica l.). GigaScience7, giy063. doi: 10.1093/gigascience/giy063
- CrossRef
- Google Scholar
40
NguyenL. T.SchmidtH. A.von HaeselerA.MinhB. Q. (2015). IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol.32, 268–274. doi: 10.1093/molbev/msu300
- CrossRef
- Google Scholar
41
PlomionC.AuryJ.AmselemJ.AlaeitabarT.BarbeV.BelserC.et al. (2016). Decoding the oak genome: public release of sequence data assembly annotation and publication strategies. Mol. Ecol. Resour.16, 254–265. doi: 10.1111/1755-0998.12425
- CrossRef
- Google Scholar
42
RamosA. M.UsiéA.BarbosaP.BarrosP. M.CapoteT.ChavesI.et al. (2018). The draft genome sequence of cork oak. Sci. Data.5, 180069. doi: 10.1038/sdata.2018.69
- CrossRef
- Google Scholar
43
RobertsA.PimentelH.TrapnellC.PachterL. (2011). Identification of novel transcripts in annotated genomes using RNA-seq. Bioinformatics27, 2325–2329. doi: 10.1093/bioinformatics/btr355
- CrossRef
- Google Scholar
44
RuanJ.LiH. (2019). Fast and accurate long-read assembly with wtdbg2. Nat. Methods17, 155–158. doi: 10.1038/s41592-019-0669-3
- CrossRef
- Google Scholar
45
SandersonM. J. (2003). R8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics19, 301–302. doi: 10.1093/bioinformatics/19.2.301
- CrossRef
- Google Scholar
46
SasakiC.KushikiY.AsadaC.NakamuraY. (2014). Acetone-butanol-ethanol production by separate hydrolysis and fermentation (SHF) and simultaneous saccharification and fermentation (SSF) methods using acorns and wood chips of quercus acutissima as a carbon source. Ind. Crop Prod.62, 286–292. doi: 10.1016/j.indcrop.2014.08.049
- CrossRef
- Google Scholar
47
SchmutzJ.CannonS. B.SchlueterJ.MaJ.MitrosT.NelsonW.et al. (2010). Genome sequence of the palaeopolyploid soybean. Nature463, 178. doi: 10.1038/nature08670
- CrossRef
- Google Scholar
48
SimãoF. A.WaterhouseR. M.IoannidisP.KriventsevaE. V.ZdobnovE. (2015). BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics31, 3210–3212. doi: 10.1093/bioinformatics/btv351
- CrossRef
- Google Scholar
49
SimeoneM. C.GrimmG. W.PapiniA.VessellaF.CardoniS.TordoniE.et al. (2016). Plastome data reveal multiple geographic origins of quercus group ilex. PeerJ4, e1897. doi: 10.7717/peerj.1897
- CrossRef
- Google Scholar
50
SorkV. L.Fitz-GibbonS. T.PuiuD.CrepeauM.GuggerP. F.ShermanR.et al. (2016). First draft assembly and annotation of the genome of a california endemic oak quercus lobata née (Fagaceae). G3 (Bethesda Md.)6, 3485–3495. doi: 10.1534/g3.116.030411
- CrossRef
- Google Scholar
51
StankeMSteinkampRWaackSMorgensternB. (2004). AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic Acids Res32. doi: 10.1093/nar/gkh379
- CrossRef
- Google Scholar
52
Tarailo-GraovacM.ChenN. (2009). Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinf.4, 4–10. doi: 10.1002/0471250953.bi0410s25
- CrossRef
- Google Scholar
53
Ter-HovhannisyanV.LomsadzeA.ChernoffY. O.BorodovskyM. (2008). Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res.18, 1979–1990. doi: 10.1101/gr.081612.108
- CrossRef
- Google Scholar
54
VurtureG. W.SedlazeckF. J.NattestadM.UnderwoodC. J.FangH.GurtowskiJ.et al. (2017). GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics33, 2202–2204. doi: 10.1093/bioinformatics/btx153
- CrossRef
- Google Scholar
55
WalkerB. J.AbeelT.SheaT.PriestM.AbouellielA.SakthikumarS.et al. (2014). Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS One9, e112963. doi: 10.1371/journal.pone.0112963
- CrossRef
- Google Scholar
56
WangJ.XuS.MeiY.CaiS.GuY.SunM.et al. (2021). A high-quality genome assembly of morinda officinalis a famous native southern herb in the lingnan region of southern China. Hortic. Res.8, 135. doi: 10.1038/s41438-021-00551-w
- CrossRef
- Google Scholar
57
WangB.YuM.SunH.ChengX.DanQ.FangY. (2009). Photosynthetic characters of quercus acutissima from different provenances under effects of salt tress. Chin. J. Appl. Ecol.20, 1817–1824.
- Google Scholar
58
WangD.ZhangY.ZhangZ.ZhuJ.YuJ. (2010). KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. genomics. Proteom. Bioinforma.8, 77–80. doi: 10.1016/s1672-0229(10)60008-3
- CrossRef
- Google Scholar
59
WeiS.YangY.YinT. (2020). The chromosome-scale assembly of the willow genome provides insight into salicaceae genome evolution. Hortic. Res.7, 45. doi: 10.1038/s41438-020-0268-6
- CrossRef
- Google Scholar
60
WuS.SunW.XuZ.ZhaiJ.LiX.LiC.et al. (2020). The genome sequence of star fruit (Averrhoa carambola). Hortic. Res.7, 95. doi: 10.1038/s41438-020-0307-3
- CrossRef
- Google Scholar
61
WuT.WangG. G.WuQ.ChengX.YuM.WangW.et al. (2014). Patterns of leaf nitrogen and phosphorus stoichiometry among quercus acutissima provenances across China. Ecol. Complex.17, 32–39. doi: 10.1016/j.ecocom.2013.07.003
- CrossRef
- Google Scholar
62
XueT.ZhengX.ChenD.LiangL.ChenN.HuangZ.et al. (2020). A high-quality genome provides insights into the new taxonomic status and genomic characteristics of cladopus chinensis (Podostemaceae). Hortic. Res.7, 46. doi: 10.1038/s41438-020-0269-5
- CrossRef
- Google Scholar
63
YangZ. (2007). PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol.24, 1586–1591. doi: 10.1093/molbev/msm088
- CrossRef
- Google Scholar
64
YangB.HeF.ZhaoX.WangH.XuX.HeX.et al. (2019). Composition and function of soil fungal community during the establishment of quercus acutissima (Carruth.) seedlings in a cd-contaminated soil. Environ. Manage.246, 150–156. doi: 10.1016/j.jenvman.2019.05.153
- CrossRef
- Google Scholar
65
YangC.MaL.XiaoD.LiuX.JiangX.YingZ.et al. (2021). Chromosome-scale assembly of the sparassis latifolia genome obtained using long-read and Hi-c sequencing. G3 (Bethesda Md.).11, jkab173. doi: 10.1093/g3journal/jkab173
- CrossRef
- Google Scholar
66
YeG.ZhangH.ChenB.NieS.LiuH.GaoW.et al. (2018). De novo genome assembly of the stress tolerant forest species casuarina equisetifolia provides insight into secondary growth. Plant J97, 779–794. doi: 10.1111/tpj.14159
- CrossRef
- Google Scholar
67
YuG.WangL. G.HanY.HeQ. (2012). clusterProfiler: an r package for comparing biological themes among gene clusters. Omics16, 284–287. doi: 10.1089/omi.2011.0118
- CrossRef
- Google Scholar
68
ZhangY. Y.FangY. M.YuM. K.LiX. X.XiaT. (2013). Molecular characterization and genetic structure of quercus acutissima germplasm in China using microsatellites. Mol. Biol. Rep.40, 4083–4090. doi: 10.1007/s11033-013-2486-6
- CrossRef
- Google Scholar
69
ZhangR. S.YangJ.HuH. L.XiaR. X.LiY. P.SuJ. F.et al. (2020). A high level of chloroplast genome sequence variability in the sawtooth oak quercus acutissima. Int. J. Biol. Macromol.152, 340–348. doi: 10.1016/j.ijbiomac.2020.02.201
- CrossRef
- Google Scholar

Summary

Keywords

Quercus acutissima, genome assembly, gene annotation, phylogenetic analysis, gene families

Citation

Liu D, Xie X, Tong B, Zhou C, Qu K, Guo H, Zhao Z, El-Kassaby YA, Li W and Li W (2022) A high-quality genome assembly and annotation of Quercus acutissima Carruth. Front. Plant Sci. 13:1068802. doi: 10.3389/fpls.2022.1068802

Received

13 October 2022

Accepted

02 November 2022

Published

24 November 2022

Volume

13 - 2022

Edited by

Kai-Hua Jia, Shandong Academy of Agricultural Sciences, China

Reviewed by

Ke Qiang Yang, Shandong Agricultural University, China; Yongqi Zheng, Chinese Academy of Forestry, China

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Wei Li, bjfuliwei@bjfu.edu.cn; Wenqing Li, 190199191@qq.com

This article was submitted to Plant Bioinformatics, a section of the journal Frontiers in Plant Science

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Plant Bioinformatics

ORIGINAL RESEARCH article

A high-quality genome assembly and annotation of Quercus acutissima Carruth

Abstract

Introduction

Materials and methods

Plant materials, DNA extraction and genome sequencing

Genome assembly, quality evaluation, and construction of pseudomolecule chromosomes

Genome annotation

Gene families and phylogenetic analysis

Synteny and WGD analysis

GO enrichment analysis

Results

Genome sequencing and assembly

Assessment of genomic integrity

Genome annotation

Gene family and phylogenetic relationships

Whole-genome duplication and synteny analysis

Discussion

Funding

Acknowledgments

Publisher’s note

Statements

Data availability statement

Author contributions

Conflict of interest

Supplementary material

References

Summary

Outline

Figures

Cite article

Article metrics

ORIGINAL RESEARCH article

A high-quality genome assembly and annotation of Quercus acutissima Carruth

Abstract

Introduction

Materials and methods

Plant materials, DNA extraction and genome sequencing

Genome assembly, quality evaluation, and construction of pseudomolecule chromosomes

Genome annotation

Gene families and phylogenetic analysis

Synteny and WGD analysis

GO enrichment analysis

Results

Genome sequencing and assembly

Assessment of genomic integrity

Genome annotation

Gene family and phylogenetic relationships

Whole-genome duplication and synteny analysis

Discussion

Funding

Acknowledgments

Publisher’s note

Statements

Data availability statement

Author contributions

Conflict of interest

Supplementary material

References

Summary

Outline

Figures

Cite article

Share article

Article metrics