A Chromosome-Level Reference Genome of Chinese Balloon Flower (Platycodon grandiflorus)

Jia, Yanyan; Chen, Shaoying; Chen, Weikai; Zhang, Ping; Su, Zhenjing; Zhang, Lei; Xu, Mengxin; Guo, Li

doi:10.3389/fgene.2022.869784

DATA REPORT article

Front. Genet., 08 April 2022

Sec. Genomics of Plants and Plant-Associated Organisms

Volume 13 - 2022 | https://doi.org/10.3389/fgene.2022.869784

A Chromosome-Level Reference Genome of Chinese Balloon Flower (Platycodon grandiflorus)

YJ
Yanyan Jia ¹
SC
Shaoying Chen ^2,3
WC
Weikai Chen ³
PZ
Ping Zhang ²
ZS
Zhenjing Su ^2,3
LZ
Lei Zhang ^2,3
MX
Mengxin Xu ^2,3
LG
Li Guo ³^*

1. School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
2. School of Big Data, Weifang Institute of Technology, Weifang, China
3. Peking University Institute of Advanced Agricultural Sciences, Weifang, China

Introduction

Chinese balloon flower (Platycodon grandiflorus) is the sole species in genus Platycoldon within the Campanulaceae family. The typical blue purple or white flowers of P. grandiflorus are frequently used for ornamental purposes (Lv et al., 2021). As a traditional oriental medicine used to treat chronic inflammatory diseases, P. grandiflorus roots have rich pharmacological activities such as expectorant antitussive, anti-inflammatory, immune regulatory and anti-tumor effects (Choi et al., 2010; Nyakudya et al., 2014; Buchwald et al., 2020; Ke et al., 2020; Lee et al., 2020). The dried form of the Platycodi radix is officially listed as a traditional herbal medicine in the Chinese, Korean and Japanese Pharmacopoeia (Su et al., 2021). Platycodi radix is also being pickled in northeast China, and made into kimchi in the Korean Peninsula. The market demand of P. grandiflorus follows the development and application of medicine, food, health products, cosmetics, ornamental and other fields (Ji et al., 2020), and its market prospects are bright.

Over 100 secondary metabolites have been isolated from P. grandiflorus including triterpenoid saponins, flavonoids, polyphenols, polysaccharide and so on (Zhang et al., 2015; Qiu et al., 2019; Huang et al., 2021). So far, the pharmacological and metabolic pathways of the main active ingredient triterpenoid saponins have been studied (Kim et al., 2020; Kim et al., 2021; Yu et al., 2021). However, the molecular basis of biochemical pathways for P. grandiflorus secondary metabolites is overall poorly understood, hindering the progress of molecular breeding and metabolic engineering of P. grandiflorus towards increased production and utilization of its natural products. A high-quality genome assembly of the P. grandiflorus will significantly accelerate the genetic characterization of secondary metabolic pathways, their regulatory mechanisms and genome-assisted breeding.

Previously, a draft genome sequence of P. grandiflorus (2n = 2x = 18) was assembled using Illumina short reads by Kim et al. yielding a quite fragmented assembly with scaffold N50 of 277 kb (Kim et al. 2020). In this study, we assembled and annotated a chromosome-scale reference genome for P. grandiflorus cultivar XJD. This genome assembly has a total length of 622.86 Mb anchored to nine chromosomes with a high contiguity (contig N50 = 29.34Mb, scaffold N50 = 65.83 Mb), representing a significant improvement over the previously published draft genome of P. grandiflorus (Kim et al., 2020). The chromosome-scale genome assembly will advance our understanding of genome function and evolution of P. grandiflorus, and facilitate its molecular breeding and metabolic engineering.

Results and Discussion

Genome Assembly

To produce a chromosome-level genome assembly of P. grandiflorus cultivar XJD. We generated about 73 Gb Nanopore long reads with an average read length of 24 kb, 112 Gb Illumina paired-end short reads of 150 bp, and 311 Gb high-throughput chromatin conformation capture (Hi-C) sequencing data. The P. grandiflorus genome was estimated to be 642.38 Mb in length with a heterozygosity rate of 0.92% and a repeat content of 60% based on K-mer analysis of Illumina reads (Supplementary Table S1, Supplementary Figure S1). Nanopore long reads were first used to produce the draft assembly by NextDenovo, which was 622.86 Mb with a contig N50 of 29.34 Mb (Supplementary Table S2) after base correction by Pilon using Illumina reads. The quality of the genome assembly was evaluated by mapping Illumina short reads to the assembly with 99.3% of short reads mapped to 96.8% of the assembled genome. Furthermore, we performed BUSCO analysis, showing that the genome assembly captured 98.1% complete BUSCOs, including 95.5% single-copy and 2.6% duplicated (Supplementary Table S3) indicating that the genome assembly had high completeness.

Hi-C data were then used to anchor the assembled contigs into individual chromosomes using ALLHiC (Zhang et al., 2019) and Juicerbox (Robinson et al., 2018), yielding nine pseudomolecules ranging from 47.09 to 104.37 Mb accounting for 95% of the assembly. Hi-C contact map showed that the nine pseudochromosomes could be distinguished clearly (Figure 1; Supplementary Table S4), consistent with the karyotype results (2n = 2x = 18) based on literature reports (Yang et al., 2016). The final genome assembly of P. grandiflorus was 622.86 Mb, with a contig N50 of 28.34 Mb, and a scaffold N50 of 65.83 Mb, the level of this genome assembly is much higher than a previous reported P. grandiflorus (Jangbaek-doraji cultivar) genome assembly (Kim et al., 2020) with a scaffold N50 of only 0.277 Mb (Supplementary Table S2). Whole genome sequence comparison showed that the two genome assemblies aligned well, where 4,815 scaffolds of Jangbaek-doraji assembly can be aligned to 99 scaffolds (95% anchored to nine chromosomes) of our XJD assembly (Supplementary Figure S2).

FIGURE 1

Genome Annotation

We then performed genome annotations combining ab initio prediction, protein homology and transcriptome data from leaves, roots and stems (Methods). The genome annotation identified 360.46 Mb repeat sequences in the P. grandiflorus genome, accounting for 57.87% of the genome. The top two categories of repetitive elements were long terminal repeats (LTRs: 51.2%) and DNA elements (2.64%). A total of 22,358 protein-coding genes were predicted in the genome, 96.91% of which can be predicted gene function, by aligning against a library of known proteins in related plant species (Supplementary Table S5). Furthermore, non-coding RNAs were predicted across the P. grandiflorus genome, detecting a total of 1,867 microRNAs (miRNAs), 989 transfer RNAs (tRNAs), 780 ribosomal RNAs (rRNAs), and 1,114 small nuclear RNAs (snRNAs).

Comparative Phylogenomics of P. grandiflorus

To determine the evolutionary relationships among P. grandiflorus and other species, we identified 1,436 single-copy orthologs from 10 representative plant species using OrthoMCL (Li et al., 2003) (Figure 2A). The protein sequence alignment of these orthologs were generated by MUSCLE (Edgar, 2004) and were used to generate a phylogenetic tree using Oryza sativa as outgroup (Figure 2B). Mikania micrantha, Helianthus annuus, Lactuca sativa were most closely related to P. grandiflorus with a divergence time around 73.8 million years ago (Mya) (Figure 2B). Gene family evolution analysis using CAFE on the 10 plant speices suggested that P. grandiflorus has 27 and 64 significantly expanded and contracted gene families (Figure 2C). Expansion gene families were enriched in 19 GO categories and 12 KEGG pathways, most of which were related to biosynthesis of secondary metabolites such as brassinosteroid, flavonoid, stilbenoid, and gingerol, and signaling pathway such as MAPK pathway (Supplementary Tables S6 and S7). Notably, P. grandiflorus contained 1,079 species-specific gene families consisting 1,914 genes relative to M. micrantha, H. annuus and L. sativa (Figure 2D). Then the GO enrichment analyses of these specific genes were performed (Supplementary Table S8). Positively selected genes in P. grandiflorus were identified by comparing with H. annuus and M. micrantha, the results of GO and KEGG analysis showed that the positively selected genes were significantly involved in DNA repair, cellular response to stress and stimulus, DNA metabolic process, nucleic acid metabolic process, DNA recombination, and so on (Supplementary Tables S9 and S10).

FIGURE 2

Materials and Methods

Plant Materials, Library Construction, and Sequencing

Fresh leaf, stem and root samples were collected from four-week-old seedlings of P. grandiflorus cultivar XJD grown in a plant growth chamber with a 16-h light photoperiod. The tissues were flash-frozen in liquid nitrogen and used for total genomic DNA or RNA extraction. Total genomic DNA of P. grandiflorus leaves were extracted using a DNeasy Plant Mini Kit (Qiagen), followed by PCR-free library construction using Illumina TruSeq DNA PCR-Free Library Preparation Kit following the manufacturer’s instructions. The libraries were sequenced on Illumina HiseqX Ten platform to generate 150 bp paired-end reads used to perform genome survey, polish the genome assembly, and evaluate the quality of assemblies.

For ONT and Hi-C sequencing, fresh young leaves were used for DNA isolation and library construction. For ONT sequencing, total genomic DNA was extracted from leaf samples using the CTAB method. ONT libraries were constructed and used for sequencing in the following steps: fragment repair, connecting reactions, quantitative detection, and library construction. Finally, single-molecule real-time sequencing was carried out on the Nanopore PromethION sequencer to obtain the raw data prior to error correction to obtain high fidelity sequence data. The Hi-C sequencing libraries were generated following a standard procedure described previously (Rao et al., 2014) involving crosslink DNA, restriction enzyme digestion, filling ends and biotin labeling, ligation, DNA purification and capture using antibody. The Hi-C libraries were subjected to quality control before being sequenced on Illumina HiseqX Ten platform. For transcriptome sequencing, total RNA was extracted from leaves, stems and roots of P. grandiflorus using the Plant RNA Purification Reagent (Qiagen) according to the manufacturer’s instructions. RNA-seq transcriptome libraries were prepared using the TruSeq RNA sample preparation Kit (Illumina), and sequencing was performed on an Illumina HiseqX Ten platform.

De Novo Genome Assembly

K-mer frequency analysis was performed using Jellyfish V2.0 (Marçais and Kingsford, 2011) to estimate the P. grandiflorus genome size, heterozygosity and repeat content. The NextDenovo (https://github.com/Nextomics/NextDenovo) was used to assemble the P. grandiflorus genome with ONT long reads, and then the Nanopore-assembled genome was polished using the Illumina DNA short reads by NextPolish V1.3.1 (Hu et al., 2020) to improve base accuracy using default parameters. Next, the ALLHiC V0.9.8 (Zhang et al., 2019) was used to reorder and anchor preliminarily assembled contigs into chromosomes based on Hi-C data using default parameters. Finally, we use the Juicerbox V1.1 (Robinson et al., 2018) to adjust the heatmap and assemble it into a chromosome version of the genome. To assess the accuracy and completeness of the assemblies, Illumina clean reads were mapped to our assembly using BWA (Li and Durbin, 2009). In addition, BUSCO (Simão et al., 2015) was used to access the completeness of the genome assembly.

Genome Annotation

Genome annotation mainly includes repetitive sequence annotation, gene annotation and non-coding RNA annotation. Firstly, transcriptome read assemblies were generated with Trinity (Grabherr et al., 2013) for the genome annotation. To optimize the genome annotation, the RNA-Seq reads from different tissues were aligned to draft genome using Hisat2 (Kim et al., 2015) with default parameters to identify exons region and splice positions. The alignment results were then used as input for Stringtie (Pertea et al., 2015) with default parameters for genome-based transcript assembly.

Repeat sequences were annotated based on homology and ab initio. Tandem Repeat was extracted using Tandem Repeats Finder (Benson, 1999) by ab initio prediction. RepeatModeler (Flynn et al., 2020), RepeatScout (Price et al., 2005), and LTR-Finder (Xu and Wang, 2007), were applied to ab initio repeat element library construction with default parameters, and RepeatMasker (Tarailo-Graovac and Chen, 2009) were used to annotate repetitive elements with the database. RepeatMasker and RepeatproteinMask were used to search the genome sequence for known repetitive elements, with the genome sequences used as queries against the repbase database (Jurka, 2000).

For gene structure prediction, Augustus (Stanke et al., 2008), GlimmerHMM (Majoros et al., 2004) and SNAP (Korf, 2004) were used in our de novo prediction study. Blast (Kent, 2002) and Genewise software (Birney et al., 2004) were used for homologous annotation performation. Based on homology prediction and de novo prediction results, combined with the transcriptome-based prediction data, the EvidenceModeler (Haas et al., 2008) was applied to integrate the prediction results for obtaining a non-redundant, more complete gene set. Finally, we used PASA (Haas et al., 2003), combined with the transcriptome assembly results, to correct the EVM annotation results, add UTR and variable shear and other information to get the final gene set. This final gene set was compared to public databases, including SwissProt (Bairoch and Apweiler, 2000), NR (Marchler-Bauer et al., 2011), Pfam (Griffiths-Jones et al., 2005), KEGG (Kanehisa et al., 2013), GO (Ashburner et al., 2000) and InterPro (Zdobnov and Apweiler, 2001) for function annotation of protein-coding genes. In addition, we also predicted different non-coding RNAs. The tRNAs were predicted using the program tRNAscan-SE (Chan and Lowe, 2019). For rRNAs are highly conserved, we predict rRNA sequences using BLAST. Other ncRNAs were identified by searching against the Rfam database with default parameters using the infernal software (Griffiths-Jones et al., 2005).

Phylogenomic Analysis

Synteny analysis was conducted using MCScanX (Wang et al., 2012) applied to BLASTp results of P. grandiflorus protein sequences. For the phylogeny analysis, OrthoMCL (Li et al., 2003) was firstly used for detecting multi-copy gene families and single-copy gene families between P. grandiflorus and other representative species, and then all the single-copy gene families were performed for multiple sequence alignment using MUSCLE (Edgar, 2004), all the comparison results were combined together to form a super alignment matrix, RAxML (Stamatakis, 2014) was used to construct phylogenetic tree species. the Oryza sativa as an outgroup, and the bootstrap value was set to 100. The MCMCTREE of PAML (Yang, 1997) was implemented to estimate the differentiation time. Time correction points are: Solanum lycopersicum - Helianthus annuus (95–106 Mya), Vitis Vinifera - Arabidopsis thaliana (105–115 Mya), P. grandiflorus - Vitis vinifera (111–131 Mya), P. grandiflorus–Oryza sativa (148–173 Mya). The time correction points are taken from the TimeTree website (Sudhir et al., 2017).

Gene Family Analysis

The CAFE software (Han et al., 2013) was used to analyze gene family expansion and contraction, based on the results of divergence times and phylogenetic relationships. In order to avoid false positive results, CAFE results were filtered, and the screening conditions for significant enrichment results were family-wide p-value < 0.05 and Viterbi p-value < 0.05. The enrichment analyses based on GO and KEGG annotations were performed to identify functional implications of the expanded and contracted genes.

Positive Selection Analysis

The protein sequences of single-copy gene families were extracted and aligned by MUSCLE (Edgar, 2004). The Codeml program of PAML software was applied for positive selection analysis using the branch-site model with H. annuus and M. micrantha as the background branch. The likelihood ratio test was used to detect candidates that underwent positive selection with a cutoff p value of 0.05. Fisher’s test and FDR correction (q-value < 0.05) were used for functional enrichment analysis of these positively selected genes.

Statements

Data availability statement

The whole genome sequence data reported in this paper have been deposited in the Genome Warehouse in National Genomics Data Center (BioProject: PRJCA003843), Beijing Institute of Genomics, Chinese Academy of Sciences/China National Center for Bioinformation (GWH: GWHARYT00000000.1) publicly accessible at https://ngdc.cncb.ac.cn/gwh. The raw sequencing data for the ONT long reads, Illumina short reads, Hi-C Illumina and RNA-seq reads have been deposited in the Genome Sequence Archive at the National Genomics Data Center (GSA: CRA003503) publicly accessible at http://bigd.big.ac.cn/gsa. The genome annotation has been deposited in https://doi.org/10.6084/m9.figshare.19093331.v1.

Author contributions

Project design and oversight: LG; Sample collection and curation: YJ; Conducting experiment and data analysis: YJ, SC, LZ, MX, ZS; Figure and table preparation: YJ, WC, SC; Result interpretation and discussion: YJ, PZ, LG; Manuscript writing and revision: YJ, WC, PZ, LG; Funding acquisition: LG, YJ. All authors read and approve the final version of this manuscript.

Funding

This project is supported by the National Natural Science Foundation of China (Grant No. 31970317), Chinese Postdoctoral Research Foundation (Grant No. 2020M683514). LG is also supported by a faculty startup package from Peking University Institute of Advanced Agricultural Sciences.

Acknowledgments

The authors also would like to thank Dr. Bo Wang at Xi’an Jiaotong University for technical assistance, and anonymous reviewers for their comments and suggestions to improve this manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2022.869784/full#supplementary-material

References

1
AshburnerM.BallC. A.BlakeJ. A.BotsteinD.ButlerH.CherryJ. M.et al (2000). Gene Ontology: Tool for the Unification of Biology. Nat. Genet.25 (1), 25–29. 10.1038/75556
- CrossRef
- Google Scholar
2
BairochA.ApweilerR. (2000). The SWISS-PROT Protein Sequence Database and its Supplement TrEMBL in 2000. Nucleic Acids Res.28 (1), 45–48. 10.1093/nar/28.1.45
- CrossRef
- Google Scholar
3
BensonG. (1999). Tandem Repeats Finder: a Program to Analyze DNA Sequences. Nucleic Acids Res.27 (2), 573–580. 10.1093/nar/27.2.573
- CrossRef
- Google Scholar
4
BirneyE.ClampM.DurbinR. (2004). GeneWise and Genomewise. Genome Res.14 (5), 988–995. 10.1101/gr.1865504
- CrossRef
- Google Scholar
5
BuchwaldW.SzulcM.BaraniakJ.DerebeckaN.Kania-DobrowolskaM.PiaseckaA.et al (2020). The Effect of Different Water Extracts from Platycodon Grandiflorum on Selected Factors Associated with Pathogenesis of Chronic Bronchitis in Rats. Molecules25 (21), 5020. 10.3390/molecules25215020
- CrossRef
- Google Scholar
6
ChanP. P.LoweT. M. (2019). tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences. Methods Mol. Biol., 1962. 1–14. 10.1007/978-1-4939-9173-0_1
- CrossRef
- Google Scholar
7
ChoiY. H.YooD. S.ChaM.-R.ChoiC. W.KimY. S.ChoiS.-U.et al (2010). Antiproliferative Effects of Saponins from the Roots of Platycodon Grandiflorum on Cultured Human Tumor Cells. J. Nat. Prod.73 (11), 1863–1867. 10.1021/np100496p
- CrossRef
- Google Scholar
8
EdgarR. C. (2004). MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput. Nucleic Acids Res.32 (5), 1792–1797. 10.1093/nar/gkh340
- CrossRef
- Google Scholar
9
FlynnJ. M.HubleyR.GoubertC.RosenJ.ClarkA. G.FeschotteC.et al (2020). RepeatModeler2 for Automated Genomic Discovery of Transposable Element Families. Proc. Natl. Acad. Sci. USA117 (17), 9451–9457. 10.1073/pnas.1921046117
- CrossRef
- Google Scholar
10
GrabherrM.HaasB.YassourM.LevinJ.ThompsonD.AmitI.et al (2013). Trinity: Reconstructing a Full-Length Transcriptome without a Genome from RNA-Seq Data. Nat. Biotechnol.29 (7), 644–652.
- Google Scholar
11
Griffiths-JonesS.MoxonS.MarshallM.KhannaA.EddyS. R.BatemanA. (2005). Rfam: Annotating Non-coding RNAs in Complete Genomes. Nucleic Acids Res.33, D121–D124. 10.1093/nar/gki081
- CrossRef
- Google Scholar
12
HaasB. J.DelcherA.MountS.WortmanJ.SmithR.HannickL.et al (2003). Improving the Arabidopsis Genome Annotation Using Maximal Transcript Alignment Assemblies. Nucleic Acids Res.31 (19), 5654–5666. 10.1093/nar/gkg770
- CrossRef
- Google Scholar
13
HaasB. J.SalzbergS. L.ZhuW.PerteaM.AllenJ. E.OrvisJ.et al (2008). Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol.9 (1), R7. 10.1186/gb-2008-9-1-r7
- CrossRef
- Google Scholar
14
HanM. V.ThomasG. W. C.Lugo-MartinezJ.HahnM. W. (2013). Estimating Gene Gain and Loss Rates in the Presence of Error in Genome Assembly and Annotation Using CAFE 3. Mol. Biol. Evol.30 (8), 1987–1997. 10.1093/molbev/mst100
- CrossRef
- Google Scholar
15
HuJ.FanJ.SunZ.LiuS. (2020). NextPolish: a Fast and Efficient Genome Polishing Tool for Long-Read Assembly. Bioinformatics36 (7), 2253–2255. 10.1093/bioinformatics/btz891
- CrossRef
- Google Scholar
16
HuangW.ZhouH.YuanM.LanL.HouA.JiS. (2021). Comprehensive Characterization of the Chemical Constituents in Platycodon Grandiflorum by an Integrated Liquid Chromatography-Mass Spectrometry Strategy. J. Chromatogr. A1654, 462477. 10.1016/j.chroma.2021.462477
- CrossRef
- Google Scholar
17
JiM.-Y.BoA.YangM.XuJ.-F.JiangL.-L.ZhouB.-C.et al (2020). The Pharmacological Effects and Health Benefits of Platycodon grandiflorus-A Medicine Food Homology Species. Foods9 (2), 142. 10.3390/foods9020142
- CrossRef
- Google Scholar
18
JurkaJ. (2000). Repbase Update: a Database and an Electronic Journal of Repetitive Elements. Trends Genet.16 (9), 418–420. 10.1016/s0168-9525(00)02093-x
- CrossRef
- Google Scholar
19
KanehisaM.GotoS.SatoY.KawashimaM.FurumichiM.TanabeM. (2013). Data, Information, Knowledge and Principle: Back to Metabolism in KEGG. Nucl. Acids Res.42, D199–D205. 10.1093/nar/gkt1076
- CrossRef
- Google Scholar
20
KeW.Bonilla-RossoG.EngelP.WangP.ChenF.HuX. (2020). Suppression of High-Fat Diet-Induced Obesity by Platycodon grandiflorus in Mice Is Linked to Changes in the Gut Microbiota. J. Nutr.150 (9), 2364–2374. 10.1093/jn/nxaa159
- CrossRef
- Google Scholar
21
KentW. J. (2002). Blat-the BLAST-like Alignment Tool. Genome Res.12 (4), 656–664. 10.1101/gr.229202
- CrossRef
- Google Scholar
22
KimD.LangmeadB.SalzbergS. L. (2015). HISAT: a Fast Spliced Aligner with Low Memory Requirements. Nat. Methods12 (4), 357–360. 10.1038/nmeth.3317
- CrossRef
- Google Scholar
23
KimJ.KangS.-H.ParkS.-G.YangT.-J.LeeY.KimO. T.et al (2020). Whole-genome, Transcriptome, and Methylome Analyses Provide Insights into the Evolution of Platycoside Biosynthesis in Platycodon grandiflorus, a Medicinal Plant. Hortic. Res.7, 112. 10.1038/s41438-020-0329-x
- CrossRef
- Google Scholar
24
KimY.-K.SathasivamR.KimY. B.KimJ. K.ParkS. U. (2021). Transcriptomic Analysis, Cloning, Characterization, and Expression Analysis of Triterpene Biosynthetic Genes and Triterpene Accumulation in the Hairy Roots of Platycodon Grandiflorum Exposed to Methyl Jasmonate. ACS Omega6 (19), 12820–12830. 10.1021/acsomega.1c01202
- CrossRef
- Google Scholar
25
KorfI. (2004). Gene Finding in Novel Genomes. BMC Bioinformatics5, 59. 10.1186/1471-2105-5-59
- CrossRef
- Google Scholar
26
LeeS.HanE. H.LimM.-K.LeeS.-H.YuH. J.LimY. H.et al (2020). Fermented Platycodon Grandiflorum Extracts Relieve Airway Inflammation and Cough Reflex Sensitivity In Vivo. J. Med. Food23 (10), 1060–1069. 10.1089/jmf.2019.4595
- CrossRef
- Google Scholar
27
LiH.DurbinR. (2009). Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform. Bioinformatics25 (14), 1754–1760. 10.1093/bioinformatics/btp324
- CrossRef
- Google Scholar
28
LiL.StoeckertC. J.RoosD. S. (2003). OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res.13 (9), 2178–2189. 10.1101/gr.1224503
- CrossRef
- Google Scholar
29
LvY.TongX.ZhangP.YuN.GuiS.HanR.et al (2021). Comparative Transcriptomic Analysis on white and Blue Flowers of Platycodon Grandiflorus to Elucidate Genes Involved in the Biosynthesis of Anthocyanins. Iran J. Biotechnol.19 (3), e2811.
- Google Scholar
30
MajorosW. H.PerteaM.SalzbergS. L. (2004). TigrScan and GlimmerHMM: Two Open Source Ab Initio Eukaryotic Gene-Finders. Bioinformatics20 (16), 2878–2879. 10.1093/bioinformatics/bth315
- CrossRef
- Google Scholar
31
Marchler-BauerA.LuS.AndersonJ. B.ChitsazF.DerbyshireM. K.DeWeese-ScottC.et al (2011). CDD: a Conserved Domain Database for the Functional Annotation of Proteins. Nucleic Acids Res.39, D225–D229. 10.1093/nar/gkq1189
- CrossRef
- Google Scholar
32
MarçaisG.KingsfordC. (2011). A Fast, Lock-free Approach for Efficient Parallel Counting of Occurrences of K-Mers. Bioinformatics27 (6), 764–770. 10.1093/bioinformatics/btr011
- CrossRef
- Google Scholar
33
NyakudyaE.JeongJ. H.LeeN. K.JeongY.-S. (2014). Platycosides from the Roots of Platycodon Grandiflorum and Their Health Benefits. Jfn19 (2), 59–68. 10.3746/pnf.2014.19.2.059
- CrossRef
- Google Scholar
34
PerteaM.PerteaG. M.AntonescuC. M.ChangT.-C.MendellJ. T.SalzbergS. L. (2015). StringTie Enables Improved Reconstruction of a Transcriptome from RNA-Seq Reads. Nat. Biotechnol.33 (3), 290–295. 10.1038/nbt.3122
- CrossRef
- Google Scholar
35
PriceA. L.JonesN. C.PevznerP. A. (2005). De Novo identification of Repeat Families in Large Genomes. Bioinformatics21, i351–i358. 10.1093/bioinformatics/bti1018
- CrossRef
- Google Scholar
36
QiuL.XiaoY.LiuY.-Q.PengL.-x.LiaoW.FuQ. (2019). Platycosides P and Q, Two New Triterpene Saponins from Platycodon Grandiflorum. J. Asian Nat. Prod. Res.21 (5), 419–425. 10.1080/10286020.2018.1488835
- CrossRef
- Google Scholar
37
RaoS. S. P.HuntleyM. H.DurandN. C.StamenovaE. K.BochkovI. D.RobinsonJ. T.et al (2014). A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell159 (7), 1665–1680. 10.1016/j.cell.2014.11.021
- CrossRef
- Google Scholar
38
RobinsonJ. T.TurnerD.DurandN. C.ThorvaldsdóttirH.MesirovJ. P.AidenE. L. (2018). Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cel Syst.6 (2), 256–258. 10.1016/j.cels.2018.01.001
- CrossRef
- Google Scholar
39
SimãoF. A.WaterhouseR. M.IoannidisP.KriventsevaE. V.ZdobnovE. M. (2015). BUSCO: Assessing Genome Assembly and Annotation Completeness with Single-Copy Orthologs. Bioinformatics31 (19), 3210–3212. 10.1093/bioinformatics/btv351
- CrossRef
- Google Scholar
40
StamatakisA. (2014). RAxML Version 8: a Tool for Phylogenetic Analysis and post-analysis of Large Phylogenies. Bioinformatics30 (9), 1312–1313. 10.1093/bioinformatics/btu033
- CrossRef
- Google Scholar
41
StankeM.DiekhansM.BaertschR.HausslerD. (2008). Using Native and Syntenically Mapped cDNA Alignments to Improve De Novo Gene Finding. Bioinformatics24 (5), 637–644. 10.1093/bioinformatics/btn013
- CrossRef
- Google Scholar
42
SuX.LiuY.HanL.WangZ.CaoM.WuL.et al (2021). A Candidate Gene Identified in Converting Platycoside E to Platycodin D from Platycodon grandiflorus by Transcriptome and Main Metabolites Analysis. Sci. Rep.11 (1), 9810. 10.1038/s41598-021-89294-1
- CrossRef
- Google Scholar
43
SudhirK.GlenS.MichaelS.BlairH. (2017). TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. Mol. Biol. Evol.34 (7), 1812–1819.
- Google Scholar
44
Tarailo-GraovacM.ChenN. (2009). Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Curr. Protoc. Bioinformatics. Chapter 4: Unit 4.10. 10.1002/0471250953.bi0410s25
- CrossRef
- Google Scholar
45
WangY.TangH.DebarryJ. D.TanX.LiJ.WangX.et al (2012). MCScanX: a Toolkit for Detection and Evolutionary Analysis of Gene Synteny and Collinearity. Nucleic Acids Res.40 (7), e49. 10.1093/nar/gkr1293
- CrossRef
- Google Scholar
46
XuZ.WangH. (2007). LTR_FINDER: an Efficient Tool for the Prediction of Full-Length LTR Retrotransposons. Nucleic Acids Res.35, W265–W268. 10.1093/nar/gkm286
- CrossRef
- Google Scholar
47
YangF.BaoG.ZhouH.ZhangY. (2016). Observation of Chromosome Number and Cytology Observation on Meiosis of Platycodon Grandiflorum. Gansu Agric. Sci. Technology10, 14–16.
- Google Scholar
48
YangZ. (1997). PAML: a Program Package for Phylogenetic Analysis by Maximum Likelihood. Bioinformatics13 (5), 555–556. 10.1093/bioinformatics/13.5.555
- CrossRef
- Google Scholar
49
YuH.LiuM.YinM.ShanT.PengH.WangJ.et al (2021). Transcriptome Analysis Identifies Putative Genes Involved in Triterpenoid Biosynthesis in Platycodon grandiflorus. Planta254 (2), 34. 10.1007/s00425-021-03677-2
- CrossRef
- Google Scholar
50
ZdobnovE. M.ApweilerR. (2001). InterProScan - an Integration Platform for the Signature-Recognition Methods in InterPro. Bioinformatics17 (9), 847–848. 10.1093/bioinformatics/17.9.847
- CrossRef
- Google Scholar
51
ZhangL.WangY.YangD.ZhangC.ZhangN.LiM.et al (2015). Platycodon grandiflorus - an Ethnopharmacological, Phytochemical and Pharmacological Review. J. Ethnopharmacology164, 147–161. 10.1016/j.jep.2015.01.052
- CrossRef
- Google Scholar
52
ZhangX.ZhangS.ZhaoQ.MingR.TangH. (2019). Assembly of Allele-Aware, Chromosomal-Scale Autopolyploid Genomes Based on Hi-C Data. Nat. Plants5 (8), 833–845. 10.1038/s41477-019-0487-8
- CrossRef
- Google Scholar

Summary

Keywords

platycodon grandiflorus, genome assembly, Oxford nanopore, phylogenomics, Hi-C

Citation

Jia Y, Chen S, Chen W, Zhang P, Su Z, Zhang L, Xu M and Guo L (2022) A Chromosome-Level Reference Genome of Chinese Balloon Flower (Platycodon grandiflorus). Front. Genet. 13:869784. doi: 10.3389/fgene.2022.869784

Received

05 February 2022

Accepted

03 March 2022

Published

08 April 2022

Volume

13 - 2022

Edited by

Sunil Kumar Sahu, Beijing Genomics Institute (BGI), China

Reviewed by

Lei Zhang, Jiangsu Normal University, China

Xiaojun Nie, Northwest A&F University, China

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Li Guo, li.guo@pku-iaas.edu.cn

This article was submitted to Plant Genomics, a section of the journal Frontiers in Genetics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Genomics of Plants and Plant-Associated Organisms

DATA REPORT article

A Chromosome-Level Reference Genome of Chinese Balloon Flower (Platycodon grandiflorus)

Introduction