Introduction
Chinese balloon flower (Platycodon grandiflorus) is the sole species in genus Platycoldon within the Campanulaceae family. The typical blue purple or white flowers of P. grandiflorus are frequently used for ornamental purposes (Lv et al., 2021). As a traditional oriental medicine used to treat chronic inflammatory diseases, P. grandiflorus roots have rich pharmacological activities such as expectorant antitussive, anti-inflammatory, immune regulatory and anti-tumor effects (Choi et al., 2010; Nyakudya et al., 2014; Buchwald et al., 2020; Ke et al., 2020; Lee et al., 2020). The dried form of the Platycodi radix is officially listed as a traditional herbal medicine in the Chinese, Korean and Japanese Pharmacopoeia (Su et al., 2021). Platycodi radix is also being pickled in northeast China, and made into kimchi in the Korean Peninsula. The market demand of P. grandiflorus follows the development and application of medicine, food, health products, cosmetics, ornamental and other fields (Ji et al., 2020), and its market prospects are bright.
Over 100 secondary metabolites have been isolated from P. grandiflorus including triterpenoid saponins, flavonoids, polyphenols, polysaccharide and so on (Zhang et al., 2015; Qiu et al., 2019; Huang et al., 2021). So far, the pharmacological and metabolic pathways of the main active ingredient triterpenoid saponins have been studied (Kim et al., 2020; Kim et al., 2021; Yu et al., 2021). However, the molecular basis of biochemical pathways for P. grandiflorus secondary metabolites is overall poorly understood, hindering the progress of molecular breeding and metabolic engineering of P. grandiflorus towards increased production and utilization of its natural products. A high-quality genome assembly of the P. grandiflorus will significantly accelerate the genetic characterization of secondary metabolic pathways, their regulatory mechanisms and genome-assisted breeding.
Previously, a draft genome sequence of P. grandiflorus (2n = 2x = 18) was assembled using Illumina short reads by Kim et al. yielding a quite fragmented assembly with scaffold N50 of 277Â kb (Kim et al. 2020). In this study, we assembled and annotated a chromosome-scale reference genome for P. grandiflorus cultivar XJD. This genome assembly has a total length of 622.86Â Mb anchored to nine chromosomes with a high contiguity (contig N50 = 29.34Mb, scaffold N50 = 65.83Â Mb), representing a significant improvement over the previously published draft genome of P. grandiflorus (Kim et al., 2020). The chromosome-scale genome assembly will advance our understanding of genome function and evolution of P. grandiflorus, and facilitate its molecular breeding and metabolic engineering.
Results and Discussion
Genome Assembly
To produce a chromosome-level genome assembly of P. grandiflorus cultivar XJD. We generated about 73Â Gb Nanopore long reads with an average read length of 24Â kb, 112Â Gb Illumina paired-end short reads of 150 bp, and 311Â Gb high-throughput chromatin conformation capture (Hi-C) sequencing data. The P. grandiflorus genome was estimated to be 642.38Â Mb in length with a heterozygosity rate of 0.92% and a repeat content of 60% based on K-mer analysis of Illumina reads (Supplementary Table S1, Supplementary Figure S1). Nanopore long reads were first used to produce the draft assembly by NextDenovo, which was 622.86Â Mb with a contig N50 of 29.34Â Mb (Supplementary Table S2) after base correction by Pilon using Illumina reads. The quality of the genome assembly was evaluated by mapping Illumina short reads to the assembly with 99.3% of short reads mapped to 96.8% of the assembled genome. Furthermore, we performed BUSCO analysis, showing that the genome assembly captured 98.1% complete BUSCOs, including 95.5% single-copy and 2.6% duplicated (Supplementary Table S3) indicating that the genome assembly had high completeness.
Hi-C data were then used to anchor the assembled contigs into individual chromosomes using ALLHiC (Zhang et al., 2019) and Juicerbox (Robinson et al., 2018), yielding nine pseudomolecules ranging from 47.09 to 104.37Â Mb accounting for 95% of the assembly. Hi-C contact map showed that the nine pseudochromosomes could be distinguished clearly (Figure 1; Supplementary Table S4), consistent with the karyotype results (2n = 2x = 18) based on literature reports (Yang et al., 2016). The final genome assembly of P. grandiflorus was 622.86 Mb, with a contig N50 of 28.34Â Mb, and a scaffold N50 of 65.83Â Mb, the level of this genome assembly is much higher than a previous reported P. grandiflorus (Jangbaek-doraji cultivar) genome assembly (Kim et al., 2020) with a scaffold N50 of only 0.277Â Mb (Supplementary Table S2). Whole genome sequence comparison showed that the two genome assemblies aligned well, where 4,815 scaffolds of Jangbaek-doraji assembly can be aligned to 99 scaffolds (95% anchored to nine chromosomes) of our XJD assembly (Supplementary Figure S2).
FIGURE 1

Overview of chromosome-level Platycodon grandiflorus genome assembly. (A)P. grandiflorus genomic features. Track a is the circular representation of nine pseudochromosomes. Track b-d represents the distribution of gene density, GC density, and repeat density, respectively, with densities calculated in 100Â kb windows. Track e shows syntenic blocks identified within P. grandiflorus genome. (B) Hi-C interaction heatmap for the P. grandiflorus genome.
Genome Annotation
We then performed genome annotations combining ab initio prediction, protein homology and transcriptome data from leaves, roots and stems (Methods). The genome annotation identified 360.46Â Mb repeat sequences in the P. grandiflorus genome, accounting for 57.87% of the genome. The top two categories of repetitive elements were long terminal repeats (LTRs: 51.2%) and DNA elements (2.64%). A total of 22,358 protein-coding genes were predicted in the genome, 96.91% of which can be predicted gene function, by aligning against a library of known proteins in related plant species (Supplementary Table S5). Furthermore, non-coding RNAs were predicted across the P. grandiflorus genome, detecting a total of 1,867 microRNAs (miRNAs), 989 transfer RNAs (tRNAs), 780 ribosomal RNAs (rRNAs), and 1,114 small nuclear RNAs (snRNAs).
Comparative Phylogenomics of P. grandiflorus
To determine the evolutionary relationships among P. grandiflorus and other species, we identified 1,436 single-copy orthologs from 10 representative plant species using OrthoMCL (Li et al., 2003) (Figure 2A). The protein sequence alignment of these orthologs were generated by MUSCLE (Edgar, 2004) and were used to generate a phylogenetic tree using Oryza sativa as outgroup (Figure 2B). Mikania micrantha, Helianthus annuus, Lactuca sativa were most closely related to P. grandiflorus with a divergence time around 73.8 million years ago (Mya) (Figure 2B). Gene family evolution analysis using CAFE on the 10 plant speices suggested that P. grandiflorus has 27 and 64 significantly expanded and contracted gene families (Figure 2C). Expansion gene families were enriched in 19 GO categories and 12 KEGG pathways, most of which were related to biosynthesis of secondary metabolites such as brassinosteroid, flavonoid, stilbenoid, and gingerol, and signaling pathway such as MAPK pathway (Supplementary Tables S6 and S7). Notably, P. grandiflorus contained 1,079 species-specific gene families consisting 1,914 genes relative to M. micrantha, H. annuus and L. sativa (Figure 2D). Then the GO enrichment analyses of these specific genes were performed (Supplementary Table S8). Positively selected genes in P. grandiflorus were identified by comparing with H. annuus and M. micrantha, the results of GO and KEGG analysis showed that the positively selected genes were significantly involved in DNA repair, cellular response to stress and stimulus, DNA metabolic process, nucleic acid metabolic process, DNA recombination, and so on (Supplementary Tables S9 and S10).
FIGURE 2

Platycodon grandiflorus phylogenomics. (A) The distribution of single-copy, multiple-copy, unique, and other genes in the 10 plant species. (B) Phylogenetic tree of the 10 plant species. The blue numbers denote divergence time of each node (MYA: million years ago). (C) Expansion and contraction in gene families of the 10 plant species. (D) Venn diagram represents the common and unique gene families among four closely related plants.
Materials and Methods
Plant Materials, Library Construction, and Sequencing
Fresh leaf, stem and root samples were collected from four-week-old seedlings of P. grandiflorus cultivar XJD grown in a plant growth chamber with a 16-h light photoperiod. The tissues were flash-frozen in liquid nitrogen and used for total genomic DNA or RNA extraction. Total genomic DNA of P. grandiflorus leaves were extracted using a DNeasy Plant Mini Kit (Qiagen), followed by PCR-free library construction using Illumina TruSeq DNA PCR-Free Library Preparation Kit following the manufacturer’s instructions. The libraries were sequenced on Illumina HiseqX Ten platform to generate 150 bp paired-end reads used to perform genome survey, polish the genome assembly, and evaluate the quality of assemblies.
For ONT and Hi-C sequencing, fresh young leaves were used for DNA isolation and library construction. For ONT sequencing, total genomic DNA was extracted from leaf samples using the CTAB method. ONT libraries were constructed and used for sequencing in the following steps: fragment repair, connecting reactions, quantitative detection, and library construction. Finally, single-molecule real-time sequencing was carried out on the Nanopore PromethION sequencer to obtain the raw data prior to error correction to obtain high fidelity sequence data. The Hi-C sequencing libraries were generated following a standard procedure described previously (Rao et al., 2014) involving crosslink DNA, restriction enzyme digestion, filling ends and biotin labeling, ligation, DNA purification and capture using antibody. The Hi-C libraries were subjected to quality control before being sequenced on Illumina HiseqX Ten platform. For transcriptome sequencing, total RNA was extracted from leaves, stems and roots of P. grandiflorus using the Plant RNA Purification Reagent (Qiagen) according to the manufacturer’s instructions. RNA-seq transcriptome libraries were prepared using the TruSeq RNA sample preparation Kit (Illumina), and sequencing was performed on an Illumina HiseqX Ten platform.
De Novo Genome Assembly
K-mer frequency analysis was performed using Jellyfish V2.0 (Marçais and Kingsford, 2011) to estimate the P. grandiflorus genome size, heterozygosity and repeat content. The NextDenovo (https://github.com/Nextomics/NextDenovo) was used to assemble the P. grandiflorus genome with ONT long reads, and then the Nanopore-assembled genome was polished using the Illumina DNA short reads by NextPolish V1.3.1 (Hu et al., 2020) to improve base accuracy using default parameters. Next, the ALLHiC V0.9.8 (Zhang et al., 2019) was used to reorder and anchor preliminarily assembled contigs into chromosomes based on Hi-C data using default parameters. Finally, we use the Juicerbox V1.1 (Robinson et al., 2018) to adjust the heatmap and assemble it into a chromosome version of the genome. To assess the accuracy and completeness of the assemblies, Illumina clean reads were mapped to our assembly using BWA (Li and Durbin, 2009). In addition, BUSCO (Simão et al., 2015) was used to access the completeness of the genome assembly.
Genome Annotation
Genome annotation mainly includes repetitive sequence annotation, gene annotation and non-coding RNA annotation. Firstly, transcriptome read assemblies were generated with Trinity (Grabherr et al., 2013) for the genome annotation. To optimize the genome annotation, the RNA-Seq reads from different tissues were aligned to draft genome using Hisat2 (Kim et al., 2015) with default parameters to identify exons region and splice positions. The alignment results were then used as input for Stringtie (Pertea et al., 2015) with default parameters for genome-based transcript assembly.
Repeat sequences were annotated based on homology and ab initio. Tandem Repeat was extracted using Tandem Repeats Finder (Benson, 1999) by ab initio prediction. RepeatModeler (Flynn et al., 2020), RepeatScout (Price et al., 2005), and LTR-Finder (Xu and Wang, 2007), were applied to ab initio repeat element library construction with default parameters, and RepeatMasker (Tarailo-Graovac and Chen, 2009) were used to annotate repetitive elements with the database. RepeatMasker and RepeatproteinMask were used to search the genome sequence for known repetitive elements, with the genome sequences used as queries against the repbase database (Jurka, 2000).
For gene structure prediction, Augustus (Stanke et al., 2008), GlimmerHMM (Majoros et al., 2004) and SNAP (Korf, 2004) were used in our de novo prediction study. Blast (Kent, 2002) and Genewise software (Birney et al., 2004) were used for homologous annotation performation. Based on homology prediction and de novo prediction results, combined with the transcriptome-based prediction data, the EvidenceModeler (Haas et al., 2008) was applied to integrate the prediction results for obtaining a non-redundant, more complete gene set. Finally, we used PASA (Haas et al., 2003), combined with the transcriptome assembly results, to correct the EVM annotation results, add UTR and variable shear and other information to get the final gene set. This final gene set was compared to public databases, including SwissProt (Bairoch and Apweiler, 2000), NR (Marchler-Bauer et al., 2011), Pfam (Griffiths-Jones et al., 2005), KEGG (Kanehisa et al., 2013), GO (Ashburner et al., 2000) and InterPro (Zdobnov and Apweiler, 2001) for function annotation of protein-coding genes. In addition, we also predicted different non-coding RNAs. The tRNAs were predicted using the program tRNAscan-SE (Chan and Lowe, 2019). For rRNAs are highly conserved, we predict rRNA sequences using BLAST. Other ncRNAs were identified by searching against the Rfam database with default parameters using the infernal software (Griffiths-Jones et al., 2005).
Phylogenomic Analysis
Synteny analysis was conducted using MCScanX (Wang et al., 2012) applied to BLASTp results of P. grandiflorus protein sequences. For the phylogeny analysis, OrthoMCL (Li et al., 2003) was firstly used for detecting multi-copy gene families and single-copy gene families between P. grandiflorus and other representative species, and then all the single-copy gene families were performed for multiple sequence alignment using MUSCLE (Edgar, 2004), all the comparison results were combined together to form a super alignment matrix, RAxML (Stamatakis, 2014) was used to construct phylogenetic tree species. the Oryza sativa as an outgroup, and the bootstrap value was set to 100. The MCMCTREE of PAML (Yang, 1997) was implemented to estimate the differentiation time. Time correction points are: Solanum lycopersicum - Helianthus annuus (95–106 Mya), Vitis Vinifera - Arabidopsis thaliana (105–115 Mya), P. grandiflorus - Vitis vinifera (111–131 Mya), P. grandiflorus–Oryza sativa (148–173 Mya). The time correction points are taken from the TimeTree website (Sudhir et al., 2017).
Gene Family Analysis
The CAFE software (Han et al., 2013) was used to analyze gene family expansion and contraction, based on the results of divergence times and phylogenetic relationships. In order to avoid false positive results, CAFE results were filtered, and the screening conditions for significant enrichment results were family-wide p-value < 0.05 and Viterbi p-value < 0.05. The enrichment analyses based on GO and KEGG annotations were performed to identify functional implications of the expanded and contracted genes.
Positive Selection Analysis
The protein sequences of single-copy gene families were extracted and aligned by MUSCLE (Edgar, 2004). The Codeml program of PAML software was applied for positive selection analysis using the branch-site model with H. annuus and M. micrantha as the background branch. The likelihood ratio test was used to detect candidates that underwent positive selection with a cutoff p value of 0.05. Fisher’s test and FDR correction (q-value < 0.05) were used for functional enrichment analysis of these positively selected genes.
Statements
Data availability statement
The whole genome sequence data reported in this paper have been deposited in the Genome Warehouse in National Genomics Data Center (BioProject: PRJCA003843), Beijing Institute of Genomics, Chinese Academy of Sciences/China National Center for Bioinformation (GWH: GWHARYT00000000.1) publicly accessible at https://ngdc.cncb.ac.cn/gwh. The raw sequencing data for the ONT long reads, Illumina short reads, Hi-C Illumina and RNA-seq reads have been deposited in the Genome Sequence Archive at the National Genomics Data Center (GSA: CRA003503) publicly accessible at http://bigd.big.ac.cn/gsa. The genome annotation has been deposited in https://doi.org/10.6084/m9.figshare.19093331.v1.
Author contributions
Project design and oversight: LG; Sample collection and curation: YJ; Conducting experiment and data analysis: YJ, SC, LZ, MX, ZS; Figure and table preparation: YJ, WC, SC; Result interpretation and discussion: YJ, PZ, LG; Manuscript writing and revision: YJ, WC, PZ, LG; Funding acquisition: LG, YJ. All authors read and approve the final version of this manuscript.
Funding
This project is supported by the National Natural Science Foundation of China (Grant No. 31970317), Chinese Postdoctoral Research Foundation (Grant No. 2020M683514). LG is also supported by a faculty startup package from Peking University Institute of Advanced Agricultural Sciences.
Acknowledgments
The authors also would like to thank Dr. Bo Wang at Xi’an Jiaotong University for technical assistance, and anonymous reviewers for their comments and suggestions to improve this manuscript.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2022.869784/full#supplementary-material
References
1
AshburnerM.BallC. A.BlakeJ. A.BotsteinD.ButlerH.CherryJ. M.et al (2000). Gene Ontology: Tool for the Unification of Biology. Nat. Genet.25 (1), 25–29. 10.1038/75556
2
BairochA.ApweilerR. (2000). The SWISS-PROT Protein Sequence Database and its Supplement TrEMBL in 2000. Nucleic Acids Res.28 (1), 45–48. 10.1093/nar/28.1.45
3
BensonG. (1999). Tandem Repeats Finder: a Program to Analyze DNA Sequences. Nucleic Acids Res.27 (2), 573–580. 10.1093/nar/27.2.573
4
BirneyE.ClampM.DurbinR. (2004). GeneWise and Genomewise. Genome Res.14 (5), 988–995. 10.1101/gr.1865504
5
BuchwaldW.SzulcM.BaraniakJ.DerebeckaN.Kania-DobrowolskaM.PiaseckaA.et al (2020). The Effect of Different Water Extracts from Platycodon Grandiflorum on Selected Factors Associated with Pathogenesis of Chronic Bronchitis in Rats. Molecules25 (21), 5020. 10.3390/molecules25215020
6
ChanP. P.LoweT. M. (2019). tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences. Methods Mol. Biol., 1962. 1–14. 10.1007/978-1-4939-9173-0_1
7
ChoiY. H.YooD. S.ChaM.-R.ChoiC. W.KimY. S.ChoiS.-U.et al (2010). Antiproliferative Effects of Saponins from the Roots of Platycodon Grandiflorum on Cultured Human Tumor Cells. J. Nat. Prod.73 (11), 1863–1867. 10.1021/np100496p
8
EdgarR. C. (2004). MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput. Nucleic Acids Res.32 (5), 1792–1797. 10.1093/nar/gkh340
9
FlynnJ. M.HubleyR.GoubertC.RosenJ.ClarkA. G.FeschotteC.et al (2020). RepeatModeler2 for Automated Genomic Discovery of Transposable Element Families. Proc. Natl. Acad. Sci. USA117 (17), 9451–9457. 10.1073/pnas.1921046117
10
GrabherrM.HaasB.YassourM.LevinJ.ThompsonD.AmitI.et al (2013). Trinity: Reconstructing a Full-Length Transcriptome without a Genome from RNA-Seq Data. Nat. Biotechnol.29 (7), 644–652.
11
Griffiths-JonesS.MoxonS.MarshallM.KhannaA.EddyS. R.BatemanA. (2005). Rfam: Annotating Non-coding RNAs in Complete Genomes. Nucleic Acids Res.33, D121–D124. 10.1093/nar/gki081
12
HaasB. J.DelcherA.MountS.WortmanJ.SmithR.HannickL.et al (2003). Improving the Arabidopsis Genome Annotation Using Maximal Transcript Alignment Assemblies. Nucleic Acids Res.31 (19), 5654–5666. 10.1093/nar/gkg770
13
HaasB. J.SalzbergS. L.ZhuW.PerteaM.AllenJ. E.OrvisJ.et al (2008). Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol.9 (1), R7. 10.1186/gb-2008-9-1-r7
14
HanM. V.ThomasG. W. C.Lugo-MartinezJ.HahnM. W. (2013). Estimating Gene Gain and Loss Rates in the Presence of Error in Genome Assembly and Annotation Using CAFE 3. Mol. Biol. Evol.30 (8), 1987–1997. 10.1093/molbev/mst100
15
HuJ.FanJ.SunZ.LiuS. (2020). NextPolish: a Fast and Efficient Genome Polishing Tool for Long-Read Assembly. Bioinformatics36 (7), 2253–2255. 10.1093/bioinformatics/btz891
16
HuangW.ZhouH.YuanM.LanL.HouA.JiS. (2021). Comprehensive Characterization of the Chemical Constituents in Platycodon Grandiflorum by an Integrated Liquid Chromatography-Mass Spectrometry Strategy. J. Chromatogr. A1654, 462477. 10.1016/j.chroma.2021.462477
17
JiM.-Y.BoA.YangM.XuJ.-F.JiangL.-L.ZhouB.-C.et al (2020). The Pharmacological Effects and Health Benefits of Platycodon grandiflorus-A Medicine Food Homology Species. Foods9 (2), 142. 10.3390/foods9020142
18
JurkaJ. (2000). Repbase Update: a Database and an Electronic Journal of Repetitive Elements. Trends Genet.16 (9), 418–420. 10.1016/s0168-9525(00)02093-x
19
KanehisaM.GotoS.SatoY.KawashimaM.FurumichiM.TanabeM. (2013). Data, Information, Knowledge and Principle: Back to Metabolism in KEGG. Nucl. Acids Res.42, D199–D205. 10.1093/nar/gkt1076
20
KeW.Bonilla-RossoG.EngelP.WangP.ChenF.HuX. (2020). Suppression of High-Fat Diet-Induced Obesity by Platycodon grandiflorus in Mice Is Linked to Changes in the Gut Microbiota. J. Nutr.150 (9), 2364–2374. 10.1093/jn/nxaa159
21
KentW. J. (2002). Blat-the BLAST-like Alignment Tool. Genome Res.12 (4), 656–664. 10.1101/gr.229202
22
KimD.LangmeadB.SalzbergS. L. (2015). HISAT: a Fast Spliced Aligner with Low Memory Requirements. Nat. Methods12 (4), 357–360. 10.1038/nmeth.3317
23
KimJ.KangS.-H.ParkS.-G.YangT.-J.LeeY.KimO. T.et al (2020). Whole-genome, Transcriptome, and Methylome Analyses Provide Insights into the Evolution of Platycoside Biosynthesis in Platycodon grandiflorus, a Medicinal Plant. Hortic. Res.7, 112. 10.1038/s41438-020-0329-x
24
KimY.-K.SathasivamR.KimY. B.KimJ. K.ParkS. U. (2021). Transcriptomic Analysis, Cloning, Characterization, and Expression Analysis of Triterpene Biosynthetic Genes and Triterpene Accumulation in the Hairy Roots of Platycodon Grandiflorum Exposed to Methyl Jasmonate. ACS Omega6 (19), 12820–12830. 10.1021/acsomega.1c01202
25
KorfI. (2004). Gene Finding in Novel Genomes. BMC Bioinformatics5, 59. 10.1186/1471-2105-5-59
26
LeeS.HanE. H.LimM.-K.LeeS.-H.YuH. J.LimY. H.et al (2020). Fermented Platycodon Grandiflorum Extracts Relieve Airway Inflammation and Cough Reflex Sensitivity In Vivo. J. Med. Food23 (10), 1060–1069. 10.1089/jmf.2019.4595
27
LiH.DurbinR. (2009). Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform. Bioinformatics25 (14), 1754–1760. 10.1093/bioinformatics/btp324
28
LiL.StoeckertC. J.RoosD. S. (2003). OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res.13 (9), 2178–2189. 10.1101/gr.1224503
29
LvY.TongX.ZhangP.YuN.GuiS.HanR.et al (2021). Comparative Transcriptomic Analysis on white and Blue Flowers of Platycodon Grandiflorus to Elucidate Genes Involved in the Biosynthesis of Anthocyanins. Iran J. Biotechnol.19 (3), e2811.
30
MajorosW. H.PerteaM.SalzbergS. L. (2004). TigrScan and GlimmerHMM: Two Open Source Ab Initio Eukaryotic Gene-Finders. Bioinformatics20 (16), 2878–2879. 10.1093/bioinformatics/bth315
31
Marchler-BauerA.LuS.AndersonJ. B.ChitsazF.DerbyshireM. K.DeWeese-ScottC.et al (2011). CDD: a Conserved Domain Database for the Functional Annotation of Proteins. Nucleic Acids Res.39, D225–D229. 10.1093/nar/gkq1189
32
MarçaisG.KingsfordC. (2011). A Fast, Lock-free Approach for Efficient Parallel Counting of Occurrences of K-Mers. Bioinformatics27 (6), 764–770. 10.1093/bioinformatics/btr011
33
NyakudyaE.JeongJ. H.LeeN. K.JeongY.-S. (2014). Platycosides from the Roots of Platycodon Grandiflorum and Their Health Benefits. Jfn19 (2), 59–68. 10.3746/pnf.2014.19.2.059
34
PerteaM.PerteaG. M.AntonescuC. M.ChangT.-C.MendellJ. T.SalzbergS. L. (2015). StringTie Enables Improved Reconstruction of a Transcriptome from RNA-Seq Reads. Nat. Biotechnol.33 (3), 290–295. 10.1038/nbt.3122
35
PriceA. L.JonesN. C.PevznerP. A. (2005). De Novo identification of Repeat Families in Large Genomes. Bioinformatics21, i351–i358. 10.1093/bioinformatics/bti1018
36
QiuL.XiaoY.LiuY.-Q.PengL.-x.LiaoW.FuQ. (2019). Platycosides P and Q, Two New Triterpene Saponins from Platycodon Grandiflorum. J. Asian Nat. Prod. Res.21 (5), 419–425. 10.1080/10286020.2018.1488835
37
RaoS. S. P.HuntleyM. H.DurandN. C.StamenovaE. K.BochkovI. D.RobinsonJ. T.et al (2014). A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell159 (7), 1665–1680. 10.1016/j.cell.2014.11.021
38
RobinsonJ. T.TurnerD.DurandN. C.ThorvaldsdóttirH.MesirovJ. P.AidenE. L. (2018). Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cel Syst.6 (2), 256–258. 10.1016/j.cels.2018.01.001
39
SimãoF. A.WaterhouseR. M.IoannidisP.KriventsevaE. V.ZdobnovE. M. (2015). BUSCO: Assessing Genome Assembly and Annotation Completeness with Single-Copy Orthologs. Bioinformatics31 (19), 3210–3212. 10.1093/bioinformatics/btv351
40
StamatakisA. (2014). RAxML Version 8: a Tool for Phylogenetic Analysis and post-analysis of Large Phylogenies. Bioinformatics30 (9), 1312–1313. 10.1093/bioinformatics/btu033
41
StankeM.DiekhansM.BaertschR.HausslerD. (2008). Using Native and Syntenically Mapped cDNA Alignments to Improve De Novo Gene Finding. Bioinformatics24 (5), 637–644. 10.1093/bioinformatics/btn013
42
SuX.LiuY.HanL.WangZ.CaoM.WuL.et al (2021). A Candidate Gene Identified in Converting Platycoside E to Platycodin D from Platycodon grandiflorus by Transcriptome and Main Metabolites Analysis. Sci. Rep.11 (1), 9810. 10.1038/s41598-021-89294-1
43
SudhirK.GlenS.MichaelS.BlairH. (2017). TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. Mol. Biol. Evol.34 (7), 1812–1819.
44
Tarailo-GraovacM.ChenN. (2009). Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Curr. Protoc. Bioinformatics. Chapter 4: Unit 4.10. 10.1002/0471250953.bi0410s25
45
WangY.TangH.DebarryJ. D.TanX.LiJ.WangX.et al (2012). MCScanX: a Toolkit for Detection and Evolutionary Analysis of Gene Synteny and Collinearity. Nucleic Acids Res.40 (7), e49. 10.1093/nar/gkr1293
46
XuZ.WangH. (2007). LTR_FINDER: an Efficient Tool for the Prediction of Full-Length LTR Retrotransposons. Nucleic Acids Res.35, W265–W268. 10.1093/nar/gkm286
47
YangF.BaoG.ZhouH.ZhangY. (2016). Observation of Chromosome Number and Cytology Observation on Meiosis of Platycodon Grandiflorum. Gansu Agric. Sci. Technology10, 14–16.
48
YangZ. (1997). PAML: a Program Package for Phylogenetic Analysis by Maximum Likelihood. Bioinformatics13 (5), 555–556. 10.1093/bioinformatics/13.5.555
49
YuH.LiuM.YinM.ShanT.PengH.WangJ.et al (2021). Transcriptome Analysis Identifies Putative Genes Involved in Triterpenoid Biosynthesis in Platycodon grandiflorus. Planta254 (2), 34. 10.1007/s00425-021-03677-2
50
ZdobnovE. M.ApweilerR. (2001). InterProScan - an Integration Platform for the Signature-Recognition Methods in InterPro. Bioinformatics17 (9), 847–848. 10.1093/bioinformatics/17.9.847
51
ZhangL.WangY.YangD.ZhangC.ZhangN.LiM.et al (2015). Platycodon grandiflorus - an Ethnopharmacological, Phytochemical and Pharmacological Review. J. Ethnopharmacology164, 147–161. 10.1016/j.jep.2015.01.052
52
ZhangX.ZhangS.ZhaoQ.MingR.TangH. (2019). Assembly of Allele-Aware, Chromosomal-Scale Autopolyploid Genomes Based on Hi-C Data. Nat. Plants5 (8), 833–845. 10.1038/s41477-019-0487-8
Summary
Keywords
platycodon grandiflorus, genome assembly, Oxford nanopore, phylogenomics, Hi-C
Citation
Jia Y, Chen S, Chen W, Zhang P, Su Z, Zhang L, Xu M and Guo L (2022) A Chromosome-Level Reference Genome of Chinese Balloon Flower (Platycodon grandiflorus). Front. Genet. 13:869784. doi: 10.3389/fgene.2022.869784
Received
05 February 2022
Accepted
03 March 2022
Published
08 April 2022
Volume
13 - 2022
Edited by
Sunil Kumar Sahu, Beijing Genomics Institute (BGI), China
Reviewed by
Lei Zhang, Jiangsu Normal University, China
Xiaojun Nie, Northwest A&F University, China
Updates
Copyright
© 2022 Jia, Chen, Chen, Zhang, Su, Zhang, Xu and Guo.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Li Guo, li.guo@pku-iaas.edu.cn
This article was submitted to Plant Genomics, a section of the journal Frontiers in Genetics
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.