DATA REPORT article

Front. Plant Sci., 05 March 2025

Sec. Functional and Applied Plant Genomics

Volume 16 - 2025 | https://doi.org/10.3389/fpls.2025.1528404

Chromosome-level assembly of the Isodon lophanthoides genome

  • College of Life Science, Nanyang Normal University, Nanyang, Henan, China

1 Introduction

Isodon lophanthoides (Figure 1A) is a perennial herb of the Lamiaceae family distributed across China, India, Myanmar, Nepal, and Vietnam (Wen et al., 2011; Zhang et al., 2022). I. lophanthoides contains a variety of bioactive compounds, such as terpenoids, flavonoids, phenolics, and polysaccharides (Lin et al., 2008; Wen et al., 2011; Zhou et al., 2014). I. lophanthoides is traditionally used to alleviate symptoms of acute jaundice hepatitis, arthritis, cholecystitis, enteritis, pharyngitis, ascariasis, and leprosy (Jiang et al., 2000). This herb is utilized in the preparation of therapeutic teas and instant granules. Additionally, it is used as an ingredient in soups and cooking. This plant plays a significant role in traditional Chinese medicine. It is cultivated extensively as a commercial raw material for the medicinal product “Xihuangcao”. The absence of genomic resources for I. lophanthoides has severely limited its genetic improvement and research on its active components. In this study, we assembled the first chromosome-level genome of I. lophanthoides and identified key genes involved in terpene biosynthesis. This work provides a valuable foundation for genetic improvement and exploring its active compounds’ biosynthetic pathways.

Figure 1

2 Materials and methods

2.1 Material collection and genome sequencing

Young leaves of I. lophanthoides, cultivated at the Artemisia Engineering Technology Center of Nanyang Normal University, were collected to extract high-quality DNA for genome sequencing. After DNA extraction, ultrasonic shearing was applied. The sequencing library was prepared through end-repair, adapter ligation, and amplification, followed by sequencing on the DNB-Seq T7 platform. For long reads, the sequencing library was prepared using the Oxford Nanopore ligation sequencing kit (SQK-LSK109). Sequencing was then performed on an R9 flow cell on the PromethION platform. For Hi-C reads, DNA was fixed in a 4% formaldehyde solution. Digestion was performed with the MboI enzyme, and digested fragments were labeled with biotin-14-dCTP. The crosslinked fragments were then blunt-end repaired and sequenced on the DNB-Seq T7 platform.

2.2 Genome survey

The k-mer method was used to estimate genome size and heterozygosity before genome assembly. The k-mer distribution was calculated from short reads using Jellyfish (Marcais and Kingsford, 2012) with k-mer length set to 21. The genome size and heterozygosity rate was estimated using the GenomeScope2 (Ranallo-Benavidez et al., 2020).

2.3 Genome assembly and gene annotation

Genome assembly was conducted using NextDenovo (Hu et al., 2024) with the overlap-layout-consensus algorithm and default parameters. NextPolish (Hu et al., 2020) was used to polish the genome assembly, applying two rounds of long-read and four rounds of short-read data correction. Hi-C reads were aligned to contigs using Juicer (Durand et al., 2016) and BWA (Jung and Han, 2022), after which the 3D-DNA pipeline (Dudchenko et al., 2017) corrected misassemblies and ordered contigs, integrating them into scaffolds. Manual inspection of scaffolds was then performed using Juicebox Assembly Tools. The final chromosome-length scaffolds were constructed using the 3D-DNA pipeline, with all computational tools run using default parameters. Misassemblies were identified and corrected based on irregular contact patterns in Hi-C data.

Repeat elements in genomes were identified using RepeatModeler (Flynn et al., 2020), and the repeat library was then processed with RepeatMasker (Tarailo-Graovac and Chen, 2009) to annotate repeats across the genome. Transposable elements (TEs) were classified using TEsorter (Zhang et al., 2022). Simple sequence repeat (SSR) markers were predicted using MISA (Beier et al., 2017). Protein-coding genes in the I. lophanthoides genome were identified using an integrative strategy that combined ab initio prediction, protein homology searches, and RNA sequencing data. For ab initio prediction, we used Augustus (Stanke et al., 2006), SNAP (Korf, 2004), GlimmerHMM (Majoros et al., 2004), and GeneMark-ET (Brůna et al., 2020) to identify gene structures in the repeat-masked genome. For protein homology prediction, protein data from sequenced Lamiaceae species were downloaded from the NCBI database and aligned for homology assessment. Additionally, HISAT2 (Kim et al., 2019) was used to map RNA-seq data (PRJNA679679) from various tissues to the genome. PASA was used to predict open reading frames. EVidenceModeler (Haas et al., 2008) integrated results from the three methods, enabling a unified gene prediction. Functional annotation was performed using BLAST (Ye et al., 2006) against NR, SwissProt, eggNOG, InterPro, GO, and KEGG databases. Functional annotations for protein-coding genes were integrated using the above methods.

2.4 Phylogenetic analysis

Protein sequences of A. trichopoda, O. sativa, V. vinifera, T. cacao, A. thaliana, S. lycopersicum, C. canephora, T. grandis, L. japonicus, S. miltiorrhiza, I. rubescens, and A. decumbens were downloaded for subsequent analyses. OrthoVenn3 (Sun et al., 2023) was used for orthology, phylogenetic, and gene family analyses. Pairwise sequence similarity was determined using BLASTP and OrthoMCL (Li et al., 2003) Markov clustering. Phylogenetic trees were constructed using FastTree2 (Price et al., 2010) with the maximum likelihood method and the JTT+CAT model, with node reliability assessed by the SH test. A divergence tree was constructed using single-copy genes and fossil evidence. Divergence times between A. thaliana and T. cacao, S. lycopersicum and C. canephora, A. thaliana and V. vinifera, A. trichopoda and V. vinifera, and L. japonicus and T. grandis were estimated using r8s (Sanderson, 2003). CAFE (Mendes et al., 2020) was used to compare cluster size differences between ancestors and each species to determine gene family expansions and contractions. A random birth-and-death model was applied to assess gene family changes across lineages in the phylogenetic tree. Conditional likelihood was used as the test statistic, with p-values of ≤ 0.01 considered significant.

2.5 Duplicated gene analysis

I. lophanthoides protein sequences were compared to identify homologous blocks. The MCScanX (Wang et al., 2012) pipeline was applied with default settings to map homologous blocks within species. The YN model in KaKs_Calculator 2.0 (Wang et al., 2010) was used to calculate nonsynonymous (Ka) and synonymous (Ks) substitution rates, as well as their ratio (Ka/Ks), for duplicate gene pairs.

3 Data

3.1 Genome assembly

DNA was isolated from I. lophanthoides samples cultivated in the laboratory. Genome size and heterozygosity were estimated using DNB short-read sequencing data. The estimated genome size from short reads was 365,686,342 bp, with a heterozygosity rate of 0.64% (k-mer length = 21). DNA from the same plant was used to assemble the I. lophanthoides genome with a combination of Nanopore and Hi-C technologies (Supplementary Table S1). Assembly with Nanopore long reads produced a genome with a total length of 379,974,750 bp, containing 70 contigs (N50 = 17,265,197 bp). After Hi-C scaffolding, 378,710,417 bp (99.67%) of the sequence was placed into 12 linkage groups (Figure 1A). These linkage groups corresponded to the 12 chromosomes of I. lophanthoides (N50 = 32,786,395 bp). BUSCO assessment showed that the assembly covered 98% of the single-copy orthologs in the embryophyta_odb10 database (1,614 genes; Supplementary Table S2). The consensus quality value (QV) was 35.77, indicating that the genome is highly accurate. The genome’s LAI value is 13.78, reaching the level of the reference genome.

3.2 Gene prediction and gene annotation

50.52% of the genome assembly consisted of repetitive elements, with half of this proportion (30% of the genome) being retrotransposons. This retrotransposon content is similar to that in I. rubescens. In the I. lophanthoides genome, 9.38% of the copies were identified as Copia elements, and 9.93% as Gypsy elements. We further classified transposable elements (TEs) using Tesort (Zhang et al., 2022), identifying 5,880 Helitrons, 4,015 LINEs, 94,428 LTRs, and 13,042 TIRs. Additionally, 153,599 SSR markers were predicted using MISA (Beier et al., 2017).

EVidenceModeler was used to integrate outputs from transcriptome data, ab initio predictions, and homology-based predictions. A total of 30,641 genes were identified, of which 28,541 were protein-coding (Figure 1B). These genes contained an average of 4.8 exons, with an average coding sequence (CDS) length of 1,112 bp (Supplementary Table S3). Functional annotation of 26,492 protein-coding genes (92.8%) was achieved using GO, NR, KEGG, TAIR, and InterProScan databases. A total of 40 genes were associated with terpene metabolism, including 12 genes in the MEA pathway and 28 in the MEP pathway (Supplementary Table S4). Non-coding RNA prediction identified 297 rRNAs, 541 tRNAs, 101 miRNAs, and 341 snRNAs.

3.3 Comparative genomic analysis of I. lophanthoides with other plants

To determine the evolutionary relationships between I. lophanthoides, I. rubescens, and other plant species, a phylogenetic tree was constructed using a total of 427,238 proteins from 12 plant species (Supplementary Table S5). These proteins were clustered into 35,165 orthogroups, of which 282 were single-copy genes (Supplementary Table S6). With known divergence times added, the phylogenetic tree indicated that the common ancestor of I. lophanthoides and I. rubescens diverged approximately 12.988 million years ago (MYA) (Figure 2A). In I. lophanthoides, 48 gene families showed significant expansion and 208 showed significant contraction. The number of expanded gene families was smaller than in I. rubescens. Compared with other Lamiaceae species, I. lophanthoides had the fewest unique gene families (Figure 2B). A transposon burst occurred in I. rubescens gene families around 1 MYA (Figure 2C). The Ks method was used to analyze orthologous gene pairs, revealing no lineage-specific whole-genome duplication events other than the shared peak in Lamiaceae (Figure 2D). Further analysis of selection-affected genes identified 323 genes under positive selection and 2,832 under negative selection (Figure 2E). Genes under positive selection were enriched in processes such as “response to salicylic acid” (Supplementary Figure S3).

Figure 2

Statements

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm.nih.gov/, SRR29855129, SRR28822717, SRR29849713 https://figshare.com/, https://figshare.com/s/791b7bef4735829aaf3e.

Author contributions

YG: Data curation, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. The Foundation of Nanyang Normal University (2023ZX011; 2024PY019), the Key Scientific Research Project of Higher Education Institutions in Henan Province (23B180002), and the Natural Science Foundation of Henan Province (242300420501) provided funding for this project.

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2025.1528404/full#supplementary-material

References

  • 1

    BeierS.ThielT.MünchT.ScholzU.MascherM. J. B. (2017). MISA-web: a web server for microsatellite prediction. Bioinformatics. 33(16), 25832585. doi: 10.1093/bioinformatics/btx198

  • 2

    BrůnaT.LomsadzeA.BorodovskyM. (2020). GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genomics Bioinf.2, lqaa026. doi: 10.1093/nargab/lqaa026

  • 3

    DudchenkoO.BatraS. S.OmerA. D.NyquistS. K.HoegerM.DurandN. C.et al. (2017). De novo assembly of the Aedes aEgypti genome using Hi-C yields chromosome-length scaffolds. Science356, 9295. doi: 10.1126/science.aal3327

  • 4

    DurandN. C.ShamimM. S.MacholI.RaoS. S.HuntleyM. H.LanderE. S.et al. (2016). Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst.3, 9598. doi: 10.1016/j.cels.2016.07.002

  • 5

    FlynnJ. M.HubleyR.GoubertC.RosenJ.ClarkA. G.FeschotteC.et al. (2020). RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci.117, 94519457. doi: 10.1073/pnas.1921046117

  • 6

    HaasB. J.SalzbergS. L.ZhuW.PerteaM.AllenJ. E.OrvisJ.et al. (2008). Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol.9, 122. doi: 10.1186/gb-2008-9-1-r7

  • 7

    HuJ.FanJ.SunZ.LiuS. (2020). NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics36, 22532255. doi: 10.1093/bioinformatics/btz891

  • 8

    HuJ.WangZ.SunZ.HuB.AyoolaA. O.LiangF.et al. (2024). NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol.25, 119. doi: 10.1186/s13059-024-03252-4

  • 9

    JiangB.LuZ.-Q.ZhangH.-J.ZhaoQ.-S.SunH.-D. (2000). Diterpenoids from isodon lophanthoides. Fitoterapia71, 360364. doi: 10.1016/S0367-326X(00)00126-X

  • 10

    JungY.HanD. (2022). BWA-MEME: BWA-MEM emulated with a machine learning approach. Bioinformatics38, 24042413. doi: 10.1093/bioinformatics/btac137

  • 11

    KimD.PaggiJ. M.ParkC.BennettC.SalzbergS. L. (2019). Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol.37, 907915. doi: 10.1038/s41587-019-0201-4

  • 12

    KorfI. (2004). Gene finding in novel genomes. BMC Bioinf.5, 19. doi: 10.1186/1471-2105-5-59

  • 13

    LiL.StoeckertC. J.RoosD. S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res.13, 21782189. doi: 10.1101/gr.1224503

  • 14

    LinC.-Z.ZhangC.-X.XiongT.-Q.FengY.-L.ZhaoZ.-X.LiaoQ.-F.et al. (2008). Gerardianin A, a new abietane diterpenoid from Isodon lophanthoides var. gerardianus. J. Asian Natural Products Res.10, 841844. doi: 10.1080/10286020802156564

  • 15

    MajorosW. H.PerteaM.SalzbergS. L. (2004). TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics20, 28782879. doi: 10.1093/bioinformatics/bth315

  • 16

    MarcaisG.KingsfordC. (2012). Jellyfish: A fast k-mer counter. Tutorialis e Manuais.1, 18.

  • 17

    MendesF. K.VanderpoolD.FultonB.HahnM. W. (2020). CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics36, 55165518. doi: 10.1093/bioinformatics/btaa1022

  • 18

    PriceM. N.DehalP. S.ArkinA. P. (2010). FastTree 2–approximately maximum-likelihood trees for large alignments. PloS One5, e9490. doi: 10.1371/journal.pone.0009490

  • 19

    Ranallo-BenavidezT. R.JaronK. S.SchatzM. C. (2020). GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun.11 (1), 1432. doi: 10.1038/s41467-020-14998-3

  • 20

    SandersonM. J. (2003). r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics19, 301302. doi: 10.1093/bioinformatics/19.2.301

  • 21

    StankeM.KellerO.GunduzI.HayesA.WaackS.MorgensternB. (2006). AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res.34, W435W439. doi: 10.1093/nar/gkl200

  • 22

    SunJ.LuF.LuoY.BieL.XuL.WangY. (2023). OrthoVenn3: an integrated platform for exploring and visualizing orthologous data across genomes. Nucleic Acids Res.51, W397W403. doi: 10.1093/nar/gkad313

  • 23

    Tarailo-GraovacM.ChenN. (2009). Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinf.25, 4.10. doi: 10.1002/0471250953.2009.25.issue-1

  • 24

    WangD.ZhangY.ZhangZ.ZhuJ.YuJ. (2010). KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteomics Bioinf.8, 7780. doi: 10.1016/S1672-0229(10)60008-3

  • 25

    WangY.TangH.DebarryJ. D.TanX.LiJ.WangX.et al. (2012). MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res.40, e49e49. doi: 10.1093/nar/gkr1293

  • 26

    WenL.LinL.YouL.YangB.JiangG.ZhaoM. (2011). Ultrasound-assited extraction and structural identification of polysaccharides from Isodon lophanthoides var. gerardianus (Bentham) H. Hara. Carbohydr. Polymers85, 541547. doi: 10.1016/j.carbpol.2011.03.003

  • 27

    YeJ.McginnisS.MaddenT. L. (2006). BLAST: improvements for better sequence analysis. Nucleic Acids Res.34, W6W9. doi: 10.1093/nar/gkl164

  • 28

    ZhangR. G.LiG. Y.WangX. L.DainatJ.WangZ. X.OuS.et al. (2022). TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Horticulture Res.9, uhac017. doi: 10.1093/hr/uhac017

  • 29

    ZhouW.XieH.XuX.LiangY.WeiX. (2014). Phenolic constituents from Isodon lophanthoides var. graciliflorus and their antioxidant and antibacterial activities. J. Funct. Foods6, 492498. doi: 10.1016/j.jff.2013.11.015

Summary

Keywords

genome, Chinese herbal medicine, Isodon lophanthoides, nanopore sequence, Hi-C assembly

Citation

Gao Y (2025) Chromosome-level assembly of the Isodon lophanthoides genome. Front. Plant Sci. 16:1528404. doi: 10.3389/fpls.2025.1528404

Received

14 November 2024

Accepted

12 February 2025

Published

05 March 2025

Volume

16 - 2025

Edited by

Yi-Hong Wang, University of Louisiana at Lafayette, United States

Reviewed by

Kang Zhang, Chinese Academy of Agricultural Sciences, China

Achraf El Allali, Mohammed VI Polytechnic University, Morocco

Updates

Copyright

*Correspondence: Yubang Gao,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics