Characterizing the Leaf Transcriptome of Chrysanthemum rhombifolium (Ling et C. Shih), a Drought Resistant, Endemic Plant From China

Chrysanthemum rhombifolium (Ling et C. Shih), an endemic plant that is extremely well-adapted to harsh environments. However, little is known about its molecular biology of the plant's resistant traits against stress, or even its molecular biology of overall plant. To investigate the molecular biology of C. rhombifolium and mechanism of stress adaptation, we performed transcriptome sequencing of its leaves using an Illumina platform. A total of 130,891 unigenes were obtained, and 97,496 (~74.5%) unigenes were annotated in the public protein database. The similarity search indicated that 40,878 and 74,084 unigenes showed significant similarities to known proteins from NCBI non-redundant and Swissprot protein databases, respectively. Of these, 56,213 and 42,005 unigenes were assigned to the Gene Ontology (GO) database and Cluster of Orthologous Groups (COG), respectively, and 38,918 unigenes were mapped into five main categories, including 18 KEGG pathways. Metabolism was the largest category (23,128, 59.4%) among the main KEGG categories, suggesting active metabolic processes in C. rhombifolium. About 2,459 unigenes were annotated to have a role in defense mechanism or stress tolerance. Transcriptome analysis of C. rhombifolium revealed the presence of 12,925 microsatellites in 10,524 unigenes and mono, trip, and dinucleotides having higher polymorphism rates. The phylogenetic analysis based on GME gene among related species confirmed the reliability of the transcriptomic data. This work is the first genetic study of C. rhombifolium as a new plant resource of stress-tolerant genes. This large number of transcriptome sequences enabled us to comprehensively understand the basic genetics of C. rhombifolium and discover novel genes that will be helpful in the molecular improvement of chrysanthemums.


INTRODUCTION
Chrysanthemum (Chrysanthemum morifolium (Ramat.)Tzvel.; Asteraceae) is among the most popular flowers in China, and the most important cut flowers in the world, having a great ornamental and economical value (Song et al., 2018;Su et al., 2019). However, the long-term artificial domestication of chrysanthemums often causes declines in their resistance to environmental stressors and adaptability (Da Silva, 2003;Chen et al., 2011Chen et al., , 2012Song et al., 2014), thereby limiting their use in landscaping and industrial production. Therefore, the development of Chrysanthemum cultivars with increased environmental tolerance has always been a goal of breeders (Su et al., 2019). Many stress resistance traits, and corresponding stress resistance gene resources identified in the wild chrysanthemum species (Zhao et al., 2009;Lu et al., 2010;Li et al., 2013), have a great significance for the genetic improvement of chrysanthemum cultivars.
RNA sequencing (RNA-Seq) is a powerful tool for quantifying and analyzing different types of RNA molecules using deepsequencing technologies (Wang et al., 2009). It provides us large-scale transcript data with high throughput, accuracy, sensitivity and reproducibility which enabled us to generate an unprecedented global view of the transcriptome of the species (Angeloni et al., 2011;Jain, 2011). RNA-seq has been widely used in plants, especially for some non-model species and some large and complex genomes, greatly accelerating the discovery of novel genes, understanding the complex tissue-specific expression patterns, and regulation networks in higher plants (Li and Dewey, 2011;Wang et al., 2014Wang et al., , 2017Wu et al., 2016).
Chrysanthemum rhombifolium Ling et Shih is a perennial herb endemic to Wushan, Chongqing in China (Shih and Fu, 1983;Bremer and Humphries, 1993) and has a high ornamental value. It has diamond-shaped leaves with dense abaxial pubescence and semi-lignified stems and branches (Figure 1). The species is welladapted to environments characterized by high temperatures, low soil fertility, and drought (Zhao et al., 2009(Zhao et al., , 2010. However, few studies performed on C. rhombifolium except using as a sample in molecular phylogeny of Chrysanthemum (Masuda et al., 2009;Zhao et al., 2010;Li et al., 2014;Ma et al., 2020) or in geographical distribution of Chrysanthemum (Zhao et al., 2009;Li et al., 2013). Here, little is known about its molecular biology of overall plant or the plant's resistant traits against stress. This prompted us to characterize its leaf transcriptome using highthroughput RNA sequencing and de novo assembly to provide a comprehensive resource for understanding the biology of C. rhombifolium in general, and gain insights in improving the breeding of chrysanthemums and other related crops.

Plant Materials
We collected C. rhombifolium plants from Wushan of Chongqing in China and planted them in the Nurse Garden of the Northeastern University, China. Fresh, mature leaves were washed with sterile water, immediately frozen in liquid nitrogen, and stored at −80 • C.

RNA Isolation and cDNA Library Construction
Total RNA was isolated from the leaves using TRIzol reagent (Invitrogen TM Life Technologies, CA, USA) following the manufacturer's instructions. The RNA quality was assessed using formaldehyde denaturing gel electrophoresis (28S:18S>2), a NanoPhotometer R spectrophotometer (IMPLEN, CA, USA), and RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA). For RNA-Seq analysis, three biological replicates were used. Sequencing libraries were generated with 1 µg RNA sample using NEBNext R Ultra TM RNA Library Prep Kit for Illumina R (NEB, USA) following the manufacturer's recommendations. The mRNA was purified from total RNA using beads with Oligo (dT), and cut into short fragments with fragmentation buffer. First-strand cDNA was synthesized using random hexamer primers and M-MuLV Reverse Transcriptase (NEB, USA), and second-strand cDNA was synthesized using buffer, dNTPs, RNase H, and DNA polymerase I. The remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of the 3 ′ ends of DNA fragments, NEBNext Adaptor with hairpin loop structure was ligated to prepare for hybridization. cDNA fragments, preferentially 250-300 bp in length, were selected by purifying the library fragments with AMPure XP system (Beckman Coulter, Beverly, USA). The size-selected, adaptor-ligated cDNA fragments were incubated with 3 µl USER Enzyme (NEB, USA) at 37 • C for 15 min, followed by 5 min at 95 • C before PCR. PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers, and Index (X) Primer. The PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system.

Sequencing and de novo Assembly
We sequenced the transcriptome library using the Illumina HiSeq 2500 platform, and generated paired-end reads. We filtered the raw data using the Filterfq program (BGI, Shenzhen, China) to remove adaptor sequences, reads in which unknown nucleotides (N) were >5%, and low-quality sequences with >20% low-quality bases (quality value ≦10) to generate clean data. The raw data were deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) with the Bioproject accession: PRJNA674029 and BioSample accessions:SAMN16633381-SAMN16633383.

Functional Annotation and Classification of Unigenes
We annotated the obtained unigenes using the NCBI Nr (non-redundant protein database), NCBI Nt (non-redundant nucleotide sequences), Swiss-Prot, Gene ontology terms (GO),  and Protein family (Pfam) using BLAST 2 with an E-value cut-off of 10 −5 to obtain information on protein function annotation. We also performed functional annotation using Clusters of Orthologous Groups of proteins (KOG/COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases to classify possible COG functions and KEGG pathways and predict possible functional classifications and molecular pathways, respectively (Conesa et al., 2005;Ye et al., 2006;Kanehisa et al., 2008).

Functional Annotation and Classification of All Non-redundant Unigenes
Based on the Nr annotation, 42,005 unigenes were assigned to three ontologies and classified into 48 functional GO categories using the Blast2GO software. Of these, 1050 (68.5%), 163 (10.6%), and 320 (20.9%) GO terms were related to cellular components, biological processes, and molecular functions, respectively (Figure 4). The assignment of GO terms in C. rhombifolium in this study focused on "cellular processes, " "metabolic processes, " "single-organism processes, " "cell, " "cell parts, " "macromolecular complex, " "membrane part, " "organelles, " and "binding and catalytic activity, " which reflected the functional gene expression characteristics during its normal growth. This result was similar to those GO terms in some drought-resistance species, e.g., bread wheat, oak, Boea hygrometrica, Boea hygrometrica, and so on (Gupta et al., 2003;Durand et al., 2010;Xiao et al., 2015;Zhu et al., 2015), which mainly due to the selective gene expression caused by various environments and physiological states.

Frequency and Distribution of SSRs
In total, 12,925 SSR regions were identified in 10,524 unigenes. Among the identified SSRs, 128 different motifs were identified, the distribution and frequencies of which are shown in Figure 6. Mononucleotide motifs were the most abundant, and A/T were the largest subset (6,328). Overall, 6,429 mononucleotide, 2,463 di-repeats, 3,694 tri-repeats, 199 tetra-repeats, 56 penta-repeats, and 84 hexa-repeats were found in the C. rhombifolium leaf transcriptome. Among the unigenes containing SSRs, 941 SSRs presented compound formation, and 1,874 contained more than one SSR. On average, one SSR was found every 8.17 kb. The observed number of SSR sequences in our study was higher than EST-SSR ever reports in Chrysanthemum (Wang et al., 2013). The SSR sequences may gain or loss of repeats at a locus in their rapid evolution for adaptation to various environments (King et al., 1997;Trifonov, 2004). The mass EST-SSR loci in C. rhombifolium may be caused by its harsh habitats. These ESTs will provide a valuable repository of abundant information for future functional SSR studies.

Phylogenetic Analysis of GME
Using the annotated sequence of GME in C. rhombifolium and other GME homologs, we constructed a phylogenetic tree among related species. All of the GME sequences from the same taxa were clustered together and GME in C. rhombifolium were grouped into a single clade with the sequences of Helianthus annuus and other Asteraceae species (Figure 7), this result revealed a close relationship of C. rhombifolium and other Asteraceas species, which consistent with the taxonomy based on morphology.

CONCLUSIONS
We obtained 130,891 unigenes from the leaf of C. rhombifolium by NGS transcriptomics, of which 97,496 (∼74.5%) unigenes were successfully annotated in the public protein database. A total of 12,925 SSRs were detected in 10,524 unigenes. This is the first genetic study of C. rhombifolium as a plant resource of stresstolerant genes. These large numbers of transcriptome sequences have enabled us to comprehensively understand the basic genetics of C. rhombifolium and discover novel genes that will be helpful in the molecular improvement of chrysanthemums.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in GenBank. The accession numbers can be found in the article.

AUTHOR CONTRIBUTIONS
YM conceived and designed the experiments. LZ collected the plants. WZ, HX, XD, JL, and JH performed the experiments and analyzed the data. YM, WZ, and LZ wrote the paper. All authors contributed to the article and approved the submitted version.

FUNDING
This project was supported by the National Nature Science Foundation of China (Nos. 31470699, 31872710, and 31770200).