Cross-Species Annotation of Expressed Genes and Detection of Different Functional Gene Modules Between 10 Cold- and 10 Hot-Propertied Chinese Herbal Medicines

According to the traditional Chinese medicine (TCM) system, Chinese herbal medicines (HMs) can be divided into four categories: hot, warm, cold, and cool. A cool nature usually is categorized as a cold nature, and a warm nature is classified as a hot nature. However, the detectable characteristics of the gene expression profile associated with the cold and hot properties have not been studied. To address this question, a strategy for the cross-species annotation of conserved genes was established in the present study by using transcriptome data of 20 HMs with cold and hot properties. Functional enrichment analysis was performed on group-specific expressed genes inferred from the functional genome of the reference species (i.e., Arabidopsis). Results showed that metabolic pathways relevant to chrysoeriol, luteolin, paniculatin, and wogonin were enriched for cold-specific genes, and pathways of inositol, heptadecane, lauric acid, octanoic acid, hexadecanoic acid, and pentadecanoic acid were enriched for hot-specific genes. Six functional modules were identified in the HMs with the cold property: nucleotide biosynthetic process, peptidy-L-cysteine S-palmitoylation, lipid modification, base-excision repair, dipeptide transport, and response to endoplasmic reticulum stress. For the hot HMs, another six functional modules were identified: embryonic meristem development, embryonic pattern specification, axis specification, regulation of RNA polymerase II transcriptional preinitiation complex assembly, mitochondrial RNA modification, and cell redox homeostasis. The research provided a new insight into HMs’ cold and hot properties from the perspective of the gene expression profile of plants.


INTRODUCTION
In the system of traditional Chinese medicine (TCM), Yin-Yang theory is one of the central theories, which is used to explain how the world and body work. Yin and Yang represent the two ends of a spectrum like cold-hot, female-male, and inside-outside. When this concept is applied to the human body, Yin and Yang are linked to different parts or organs of the body or simply one's feeling of cold and hot (Gu and Pei, 2017). Yin and Yang balance explains all changes and activities in nature, including the balance of life and body functions (Chan, 1995). The breaking of balance gives rise to different syndromes, which can be classified as "cold" syndromes and "hot" syndromes. These two types of syndromes therapeutically direct the use of Chinese herbal medicines (HMs) in TCM . The "Yin-Yang" attribution is also used to define the nature of HMs, as HMs are generally featured as cold or hot. Actually, the nature of HMs consists of four types: hot, warm, cold, and cool, based on their interaction with human body. Cool nature usually is categorized as cold nature, and warm nature is classified as hot nature. For example, chewing a mint (Mentha spicata) leaf elicits a cold feeling, while masticating a piece of ginger (Zingiber officinale) root leads to a hot sensation. As a result, mint leaf is considered to be "cold" while ginger root is considered to be "hot" in nature (Zhao et al., 2011). Furthermore, in order to maintain a homeostasis in our body, the hot syndrome disease can be treated with cold nature HMs and cold syndrome disease with hot nature HMs (Chan, 1995). The cold and hot properties are the important medicinal properties of HMs in TCM theory.
The theory of four properties of HMs as an integral part of TCM is rooted from ancient Chinese philosophy. From the modern medical science point of view, it relies largely on the accumulation of experiences and the subjective opinions of TCM practitioners and therefore lacks objective, quantitative measurements and analysis. Since 1970s, with the introduction of modern science and technology, research on the properties of HMs have made some progress, and new ideas and methods have been appeared constantly, including micro-calorimetry (Ren et al., 2009;Jia et al., 2010;Zhao et al., 2011), support vector machine (SVM) (Ung et al., 2007), biophoton radiation detection (Zhao and Han, 2013), predicting system based on chemical material basis (Long et al., 2011), etc. According to different modern scientific instruments and methods, the properties of HMs were summarized by researchers from different perspectives. For example, from the thermokinetic point of view, the herbs that caused the body to release heat owned the "hot" property, and those that absorbed the heat owned "cold" property (Zhao et al., 2011). Some pharmacological studies supported the notion that the cold and hot properties of HMs were closely related to excitability of nervous system and endocrine (Li et al., 2008;Fu et al., 2009), mitochondrial ATP generation, and immunomodulatory function (Ko et al., 2004(Ko et al., , 2006Ko and Leung, 2007). Moreover, research on the functions of HMs' targeted proteins suggested that hot propertied HMs were more related to inflammation and immunity regulation and that cold-propertied HMs possessed tendencies in cell proliferation (Liang et al., 2013). From a chemical point of view, the herbal compounds associated with cold nature generally possessed more polar structures. Their molecular weights were lower, in contrast to the compounds associated with hot nature Huang et al., 2018). This research shows that there are evident differences between cold and hot HMs in the chemical composition or the efficacy and cognitive experience gained from patient response. To date, there is no report on the research of the cold and hot properties of HMs from the perspective of plant gene expression profile based on RNA-seq.
With the advent of high-throughput RNA-seq, there has been a concerted effort on generating a whole transcriptome of plant species Zeng et al., 2010;Wenping et al., 2011;Yuan et al., 2012;He et al., 2013). Transcriptome studies have focused on functional genes and metabolic pathways and provided molecular basis for metabolic pathways of natural drugs (Soetaert et al., 2013;Gao et al., 2014). As previous studies mainly focused on one species, integrating data of multiple species in a reasonable way will be more informative to study the gene function relationship between different properties of HMs. Indeed, cross-species meta-analysis of gene expression profiles has previously been used to address many questions in plants, such as the adaptation to progressive drought stress (Shaar-Moshe et al., 2015). In the present study, a method for cross-species comparison of plant was established to investigate the differences between cold and hot properties of HMs in the active gene level.
Until now, few computational methods have been proposed and developed to analyze interspecies gene expression data. Fisher's combined probability test, which transforms p-values from any number of tests into one single p-value, has been a popular method for comparing multiple gene expression experiments (Campain and Yang, 2010;Tseng et al., 2012;Matur and Ulgen, 2019). An approach was developed to compare gene expression of homologs (identified using reciprocal best BLAST hits) over a wide range of experimental conditions (Stuart et al., 2003). Another method called mDEDS combined several different statistical measures to perform a cross-species comparison of gene expression profiles (Le et al., 2010). Other methods include LOLA (Cahan et al., 2005) and L2L (Newman and Weiner, 2005), which are both online tools for comparisons among different species by ranking lists of differentially expressed genes from microarray studies. These methods were applied to comparisons among species that are related relatively closely in phylogenetic terms, such as mice and humans. Furthermore, a method for comparing gene expression among distant species was developed by taking the homology structure of compared species into account and comparing expression data from genes with any number of orthologs and paralogs (Kristiansson et al., 2013). However, the homology mapping mainly depended on whether these species had available well-annotated genomes (e.g., model species). As HM many plants are less common (non-model species) and distantly related species, such as gymnosperm Ephedra sinica and angiosperm Scutellaria baicalensis, these methods did not seem to be suitable for most species in this study. Therefore, a method for crossspecies comparison of less common and distant species was needed to be established.
Benchmarking Universal Single-Copy Ortholog system (BUSCO 1 ) (Waterhouse et al., 2018) proposed a measure for the assessment of assembled transcriptomes with singlecopy orthologs. Specifically, BUSCO assessment tool used the orthologs of 31 plants from OrthoDB 2 (Zdobnov et al., 2017) to produce 1,440 BUSCO gene sets from plant phylogenetic clades. By using HMMER 3 (Eddy, 2011), hidden Markov model (HMM) profiles were obtained from amino acid alignments built with these 1,440 gene sets. Then, the transcriptome completeness information (C: single and completed; D: duplicated and completed; F: fragmented; M: missing) of plant phylogenetic clades could be assessed, which provided a way to find out the orthologs (C: single and completed) of less common and distant plant species. We could use these orthologs (C: single and completed) to establish our cross-species comparison standard. That is, with the aid of BUSCO, ortholog and phylogenetic information could be taken into account during the gene annotation process. The flowering plant Arabidopsis thaliana is a dicot model organism for research in many aspects of plant biology. The comprehensive annotation of its genome paves the way for understanding the functions and activities of all types of transcripts, including mRNA, the various classes of noncoding RNA, and small RNA (Provart et al., 2016). The maintained database TAIR11 completely recorded its genetic and molecular biology data . Therefore, the main annotation strategy in this study was to use the model organism A. thaliana as the reference to compare any other query species to annotate genes.
In this study, 20 sets of available plant RNA-seq data (HMs with the cold and hot properties) from 20 published papers were collected. We first created a comparison standard for cross-species designation of gene function based on BUSCO results, applied it in BLASTP search, and then annotated their genes. With the specifically expressed gene on hot or cold HMs, we finally performed functional enrichment analysis to identify differences between hot-enriched and coldenriched functional modules. From the perspective of gene expression profiles, the results might provide useful new clues for exploring measurable features of cold and hot HMs with their original definition.

Selection of HMs With Cold and Hot Properties
To obtain sufficient data for HMs, we reviewed the advances in HMs transcriptome studies. Specifically, the databases MEDLINE 3 , Embase 4 , and Google Scholar 5 were searched for articles published up to January 01, 2018. For the information associated with HMs, such as TCM properties, medicinal organs of plants were searched from Chinese Medicine Dictionary (Shanghai Science and Technology Press, Second Edition) and listed as basic information. The collected data were filtered, and HMs datasets only from Illumina paired-end sequencing with similar read lengths were kept for further study. In order to achieve even organ distribution between the two categories of HMs, 5 samples for roots, stems, leaves, and flowers were selected, respectively. In this way, a total of 10 cold and 10 hot HMs were enrolled ( Table 1). The phylogenetic tree of species was constructed by NCBI Taxonomy tools 6 .

De novo Assembly and Assessment of the Transcriptome Assemblies
The RNA-seq data sets were downloaded from the NCBI Sequence Read Archive (SRA 7 ) according to their SRR IDs recorded in 20 articles (Table 1). Then, all reads were processed through a trimming pipeline using Trimmomatic (version 0.32, default parameters) (Bolger et al., 2014) to remove residual adapters, low-quality sequences, and reads below 36 bp. FastQC 8 was used an overview for sequencing quality. The remaining high-quality reads were de novo-assembled into candidate unigenes longer than 200 bp by using Trinity (version 2.6.6, default parameters) (Grabherr et al., 2011). The abundance of each transcript and gene measured by the value of transcripts per million (TPM) was calculated by using the script align_and_estimate_abundance.pl from Trinity software package. Here, RSEM (Li and Dewey, 2011) was called to compute TPM after alignments were done by Bowtie2 (Langmead and Salzberg, 2012).
TransRate (Smith-Unna et al., 2016) was used to evaluate the transcriptome assembly contiguity by producing a score based on length-based and mapping metrics. BUSCO (version 3.0), which evaluates assembly content by searching the assemblies against conserved single copy orthologs found in all Embryophyta, was applied for further quantitative assessment of the assembly completeness.

Cross-Species Gene Annotation
Arabidopsis thaliana protein sequences and annotation information downloaded from TAIR11 were selected as reference. The distances of phylogenetic relationships between A. thaliana and each species in our research varied greatly; some species such as gymnosperm E. sinica and angiosperm S. baicalensis were distantly related. The annotations of genes were assigned by the reference genes through the searching results of BLASTP (Camacho et al., 2009). Among the species, the results would be greatly influenced by the evolutionary distance from reference if uniform cutoff was set, because, at the same sensitivity, the longer distance the species to which the sequence belonged, the less possibility it was to hit the homologous gene. So, it was necessary to establish the reasonable cutoff value for each species separately by considering its evolutionary distance. Meanwhile, the E-value, which considered the size of the database and the scoring system, provided an indication of the statistical significance of a given pairwise alignment. Therefore, E-value was chosen as the cutoff, and it must be that when the phylogenetic relationships between the species and the reference are farther, the looser they are. The process was as follows (Figure 1).
Step 1: Identify species candidate protein sequences. For each species, the transcriptome was assembled de novo from the RNA-seq reads by using Trinity. The sequences were trimmed down to open reading frame by using Exonerate (Slater and Birney, 2005). The longest transcript was extracted for each protein-coding gene locus after confirming the presence of start and stop codons and the proper reading frame. The coding sequences were then translated into protein sequences.
Step 2: Identify orthologs of HM plants conserved genes in Arabidopsis. This step was to classify the candidate conserved genes form 20 plants into BUSCO orthologs and establish gene pairs of HM plants and Arabidopsis.
(I) OrthoDB 2 was a comprehensive catalog of orthologs. In 2016 OrthoDB reached its 9th release, which contained more than 22 million genes from over 5,000 species. BUSCO used OrthoDB to define 1,440 BUSCO sets, which contained gene orthologs from 31 species of plants (including A. thaliana). (II) The HMMER3 HMM profiles from amino acid alignments were used to assess whether a protein sequence from species could join the orthologs of distinct BUSCO set and to classify the matches into four categories of C, D, F, or M. (III) HMMER3 matching was performed on each species separately, and only the newly formed orthologs of C were retained for subsequent steps. Among genes in the same orthologs, the gene from HM species and the gene from A. thaliana were picked up to form a gene pair. Frontiers in Genetics | www.frontiersin.org Step 3: Define BLASTP E-value cutoff values for each species.
(I) In each HM species, BLASTP was performed with default parameter between the protein sequences of A. thaliana and the candidate protein sequences of each species to get the hits and their similarity information including E-value. (II) In each HM species, among the matched gene pair identified by BLASTP, those corresponding to the gene pairs obtained by using BUSCO ( Step 2) were found. Their E-values were collected for 20 species, respectively, and the maximum value of individual species was taken as the E-value cutoff of the species.
Step 4: Another round of BLASTP was run for each HM species by setting the species-specific E-value cutoff yielded from Step 3, again using A. thaliana as reference.
The gene function information was finally assigned according to its matched gene of A. thaliana. It should be noted that one reference sequence might be matched by more than one HM sequence, so there are cases where multiple genes in same species shared the same annotation. Meanwhile, the genes that were expressed only in cold HMs or only in hot HMs were figured out, as well as genes that were expressed only in specific organ, dicotyledons or non-dicotyledons. The reference genes that matched their counterparts only in a certain group of HMs were defined as group-specific genes.
To further validate the reliability of the newly formed orthologs and the cross-species annotation of 20 HMs conserved genes, we conducted two types of analysis accordingly: (1) Another model organism (Oryza sativa Japonica Group) was selected as the outgroup. In each newly formed ortholog ( Step 2), the conserved genes of 20 HMs and the outgroup were collected. These multi-sequence data sets were, respectively, aligned by using MAFFT (v7.305, options -thread -12 -auto) (Katoh and Standley, 2013) and filtered by using trimAl (v1.4 rev10, option -automated1) (Capella-Gutierrez et al., 2009). Then, all the alignments were concatenated, and the maximum likelihood tree was produced by using RAxML (v8.1.2, raxmlHPC-PTHREADS-SSE3 -T 12 -f a -m PROTGAMMAJTT -N 100 -n rodents -s $wd/supermatrix.aln.faa -p 13432 -x 89090) (Stamatakis, 2014). The resulting tree was rooted by newick utilities (v1.6) (Junier and Zdobnov, 2010). The overall validity of conservation for the annotated genes from 20 HMs could be verified by comparing the inferred phylogenetic relationships with the NCBI Taxonomy tree.
(2) Model organism (O. sativa Japonica Group) was selected as another reference by using database Ensembl. The gene function information of 20 HMs' conserved genes was assigned according to its best matched gene of O. sativa by using BLASTP. The reliability of cross-species annotation of 20 HMs' conserved genes was evaluated by comparing consistency of annotation results between using A. thaliana as the reference and using O. sativa as the reference. On the one hand, the proportions of the genes with consistent annotations were manually inspected and counted. On the other hand, genes of A. thaliana matched to HMs were assigned to Oryza sativa using BLASTP. The pairs of genes from A. thaliana and O. sativa were obtained. Then, the proportions of the pairs of which both genes were matched the same HM sequence were calculated.

Gene Function Module Enrichment Analysis
Functional enrichment analysis was performed on group-specific genes by using Metascape 9 , an online bioinformatics pipeline for multiple gene lists, which supports effective data-driven gene prioritization decisions (Zhou et al., 2019). The analysis workflow included (i) ID conversion of input gene identifiers into Entrez gene IDs of A. thaliana; (ii) extraction of annotations for the gene list using GO ids and KEGG Pathways ids; and (iii) functional enrichment analysis through the gene list. Functional categories of GO Molecular Functions, GO Cellular Components, GO Biological Processes GO were applied for the analysis of organ-specific and dicotyledon-/non-dicotyledon-specific genes, and GO Biological Processes and KEGG Pathways were applied for cold/hot specific genes. All genes in the reference genome (A. thaliana) were used as the enrichment background and the filtering criteria for the results were, minimal number of overlap genes ≥ 3, enrichment factor > 1.5, and P-value < 0.01. Remaining significant terms were then hierarchically clustered into a tree based on Kappa-statistical similarities among their gene memberships. A kappa score of 0.3 was applied as the threshold to cast the tree into term clusters. A subset of up to 10 representative terms from each of the 20 top-score clusters were then selected and converted into a network layout. Networks were visualized through Cytoscape (v3.1.2) (Bauer-Mehren, 2013) using 'force-directed' layout with edges bundled for clarity. For identifying enrichment of specific terms of interest, a term with FDR < 0.05 was considered significant.

Statistical Analysis for Between-Group Comparisons
The Illumina sequencing and transcriptome assembly results were compared between cold and hot properties of HMs. For normally distributed data, the homogeneity of variance test was conducted first. T-test was performed for those with homogeneity of variance, and t -test for those with missing variance. For nonnormally distributed data, Mann-Whitney U non-parametric test was performed. All tests were performed using a two-tail hypothesis, with significance set at P < 0.05.

Acquisition of Chemical Components of HMs and Their Metabolic Pathways
All of the chemical ingredients of HMs were collected from BATMAN-TCM (Bioinformatics Analysis Tool for Molecular mechANism of TCM) , an online bioinformatics analysis tool for studying the molecular mechanism of TCM. As components of cold2 (Herba Dracocephali Tangutici), cold6 (Herba Swertiae Mussotii), hot7 (Rhizoma Anemones Flaccidae) and hot10 (Radix Linderae) were not recorded in the database, chemical ingredients for these four HMs were obtained from literature (Zhang et al., 2008;Zuo et al., 2015;Chen et al., 2016;Liu T. et al., 2016;Yang et al., 2016;Xiang et al., 2019;Yixi et al., 2020). For the cold or hot group, the components that were shared by at least two HMs in one group and were not contained in the other group were regarded as the group-specific components. The metabolic pathways of group-specific chemical ingredients were searched through KEGG database 10 .

Statistical Results of HMs High-Throughput Transcriptome Literature Data
Up to January 01, 2018, 159 articles related to HM and transcriptome were identified (Supplementary Table S1). The information collection process was shown in Figure 2A. Among the articles, 113 HMs were screened, including 32 with the hot property (32 with warm property and 0 with hot property), 58 with the cold property (45 with cold property and 13 with cool property), and 23 with a neutral property ( Figure 2C). Of the sampled organs from HMs, the largest proportion was 30.60% in leaves, followed by roots, stems, flowers, and fruits, accounting for 26.12, 23.13, 11.94, and 5.97%, respectively. The proportion of seeds was the smallest, only 2.24% ( Figure 2B). There were four types of platforms used for sequencing reported in the articles, namely Ion Torrent, PacBio, 454, and Illumina ( Figure 2D). Among them, Illumina covered 81% of the total, which came to be the most commonly used sequencing platform for HM transcriptome sequencing, followed by 454 (18%). Illumina's models included Illumina miseq (1.02%), Illumina nextseq (1.53%), Illumina genome analyzer (14.80%), and Illumina hiseq (82.65%).

Selection of HMs With Cold and Hot Properties and de novo Assembly
As shown in Table 1, 10 cold and 10 hot propertied HMs were recruited into the analysis. The selected samples were evenly distributed among organs ( Figure 3B). The phylogenetic tree of our selected 20 species showed that the closest species to the reference specie (A. thaliana) was Isatis tinctoria (cold10), and the farthest was E. sinica (hot6) (Figure 3A). The species in this lineage contained dicotyledonous and non-dicotyledonous plants and were from Magnoliophyta, except for hot6 from Gnetophyta (Supplementary Table S2).
The RNA-seq datasets were downloaded according to their SRR IDs (Table 1). After the removal of adaptor, ambiguous reads, and low-quality reads, the GC content was 42-48%, the percentage of Q20 of each sample was 100%, and the percentage of Q30 was 86.43-97.56% (Supplementary Table S3). 10 https://www.genome.jp/kegg/kegg2.html The FastQC results of each sample before and after quality control were shown on Supplementary Figure S1. All the clean reads from each sample were brought together and assembled de novo by using Trinity, respectively, generating 33,605 ∼ 102,400 unigenes. The average lengths of the unigenes were 577 ∼ 1,101 bp, and the lengths of N50 were 796 ∼ 2,155 bp. The length distribution of unigenes was shown in Supplementary Figure S2. The TransRate assembly score was 0.09014 ∼ 0.53457 and the BUSCO completeness range was 44 ∼ 77.3% (Supplementary Table S4).
The Illumina sequencing and transcriptome assembly results were compared between cold and hot HMs, including reads number, GC content, average length, base sequencing accuracy, total number of reads, total number of bases, mapping rate, BUSCO completeness, TransRate assembly score, and N50. The results showed that except for N50 (P = 0.0432), all the other variables listed above revealed no significant differences between cold and hot groups (p-values > 0.05, Figures 3C-E).

Cross-Species Gene Annotation Results
The abundances of unigenes were comprehensively surveyed. In most species, the number of genes with TPM < 1 [log10(TPM) < 0] accounts for a very small proportion, and the cumulative frequency of genes increases rapidly after TPM < 1 (Supplementary Figure S3). As the computing of TPM had been weighted by library size, and in order to choose genes only with strong expression evidences, TCM > 1 was set as the filtering criterion to collect candidate genes for analyses.
Among expressed genes, some conserved genes were classified into BUSCO sets (Supplementary Data S1/BUSCO result). With these genes in all 20 species, new complete single-copy orthologs were found. Among the gene items of the orthologs, the gene pairs of genes from sample species and a gene from A. thaliana were identified (Figure 1, Step 2-III). Similarity indicators for these gene pairs obtained by BLASTP through the default parameter (Supplementary Data S1/BLASTP default parameter) were identified. For each species, maximum E-value on this list was taken as cutoff for corresponding species and applied to the final annotation. Figure 4A shows the phylogenetic tree of 20 samples, 31 species from BUSCO, and each sample's E-value cutoff, which was converted to "−log10(E-value)." Isatis indigotica (cold10) and A. thaliana were closest in genetic relationship, as they both belonged to Cruciferae. The value of I. indigotica was 114, which was the largest, while Gymnosperms ephedra (hot6) was the farthest species from A. thaliana, with a value of 16, the smallest. The values of Isodon rubescens (hot2), Dracocephalum tanguticum (cold2), S. baicalensis (cold3), Rehmannia glutinosa (cold1), Andrographis paniculata (cold5), Swertia mussoi (cold6), Gentiana rigescens (cold7), Catharanthus roseus (cold4), Gardenia jasminoides (cold8), Atractylodes lancea (hot4), Erigeron breviscapus (hot5), Panax notoginseng (hot9), and Lonicera japonica (cold9), which belongs to Compositae, ranged from 43 to 57. The values of Curcuma longa (hot3), Allium fistulosum (hot1), Pinellia ternata (hot8), and Lindera glauca (hot10), which belonged to Monocotyledonous, far away from A. thaliana, were lower than 35. These results show the tendency that the larger the evolutionary distance between a sample and A. thaliana, the greater the E-value, which made the next round of BLASP for gene annotation more sensitive in species that were distantly related to the reference. By applying the specific E-value as BLASTP cutoff for each species (Supplementary Data S1/BLASTP specific cutoff), there were 1,693-2,152 distinct reference genes matched by genes of 20 samples (Figure 4B), and no significant difference in quantity between hot and cold groups (P = 0.5787 > 0.05). The assembled sequences and their annotations are available by request through email. There were 27,416 proteins of A. thaliana collected in TAIR11; among them, 4,810 (17.54%) in total were hit by HM proteins. In each group (cold/hot) of HMs, the numbers of proteins that hit each single protein of the reference were counted. The numbers of reference genes were counted according to how many proteins were hit ( Figure 4E). The heat map showed that the numbers of reference genes which matched different numbers of the genes of cold or hot propertied HMs were generally evenly distributed on both sides of the diagonal, indicating that there was no obvious cold or hot bias under the annotation process.
Besides, the inferred phylogenetic tree ( Figure 4D) through concatenated conserved gene sequences rooted with O. sativa showed good agreement with the species phylogeny ( Figure 4A). There was minor deviation on hot1 and hot10, which might be due to the distant evolutionary distance relative to other plants. The coincidence rates of the annotation results between using A. thaliana as the reference and using O. sativa as the reference were 85.32-95.39% or 84.20-93.77% (by the BLASTP match or the manual check, Figure 4C). These results showed that the cross-species annotation of conserved genes in this study were relatively reliable.

Functional Enrichment Analysis of Specific Organs
There were 4,810 kinds of genes expressed in one or more organs, and 2,234 (44.46%) were shared by all four organs. The number of kinds of genes specific to organs is 279 for roots, 584 for leaves, 290 for stems, and 376 for flowers ( Figure 5A and Supplementary Data S2/specific genes). Enrichment analysis was performed on genes exclusively expressed in only one organ by using Metascape. The results showed the organspecific functional patterns on these four groups ( Figure 5B and Supplementary Data S2/enriched GO terms), and some enriched items might be explained by knowledge of the function of the specific organ in the organisms. For example, in leaves, chloroplast envelope (GO:0009941), chloroplast stroma (GO:0009570), chloroplast nucleoid (GO:0042644), chlorophyll biosynthetic process (GO:0015995), response to high light intensity (GO:0009644), and short-day photoperiodism (GO:0048572) were detected. Chloroplast envelope, chloroplast stroma, chloroplast nucleoid, and chlorophyll biosynthetic process are related to chloroplast. The variation of light intensity has obvious effects on leaf external morphology, internal anatomy, and physiological characteristics (Feng et al., 2018). Photoperiod has effects on photosynthesis (Mousseau, 1984). In roots, a response to toxic substance (GO:0009636), abscisic acid (ABA)-activated signaling pathway (GO:0009738), and toxin catabolic process (GO:0009407) were detected. Plants are able to release chemical compounds from their roots into the soil. Some of these products are toxic when the roots of neighboring plants take them up (Bonner, 1950). As a result, the response to toxic substance and toxin catabolic process might be a "protective behavior" of roots. ABA, a major abiotic stressresponsive hormone, plays an important role in root hair elongation (Richardson et al., 2009;Wang T. et al., 2017). ABA enhances both auxin transport and auxin biosynthesis in root tips, and ABA and auxin co-regulate a set of genes to promote root hair length . In flowers, enriched function included microsporogenesis (GO:0009556) and an anthocyanin-containing compound metabolic process (GO:0046283). The relationship between microsporogenesis and flower development has been examined in some research. Generally, microsporogenesis and pollen formation are precisely timed and choreographed, and meiosis occurs in a precise chronological order that correlates with the flower bud size (Yang and Kang, 2015). Anthocyanins are commonly found in flowers and the fruits of many plants. Most of the red-, purple-, and blue-colored flowers contained anthocyanins (Khoo et al., 2017). Additionally, significant variation in one metabolite that belongs to "Vitamin" class was putatively identified as [5-Hydroxy-4-(hydroxymethyl)-6-methyl-3-pyridinyl]methyl dihydrogen phosphate in leaves rather than flowers (Sotelo-Silveira et al., 2015). Our results showed that genes mapped to the reference canonical pathways in KEGG with vitamin B6 metabolic function were only found in leaf-specific genes (AT5G53580). Very interestingly, seed development (GO:0048316) in leaves and male gamete generation (GO:0048232) in stems were also found in the results.

Functional Enrichment Analysis of Dicotyledons and Non-dicotyledons
In this study, 5 of the samples were dicotyledons (hot1, hot3, hot6, hot8, and hot10); the other 15 samples (hot2, hot4, hot5, hot7, and hot9 and 10 cold-propertied samples) were non-dicotyledons. The specific genes of dicotyledons or non-dicotyledons were found. The number of specific genes in dicotyledons and nondicotyledons were 1,700 and 315, respectively ( Figure 5C and Supplementary Data S3/specific genes). Several GO functional categories were identified by enrichment analysis (Figure 5D and Supplementary Data S3/enriched GO terms). Among them, regulation of stomatal complex patterning (GO:2000037) was only enriched in dicotyledons. As eudicots were also known as the "three-pore pollen group, " because their pollen had three or more pores, which distinguished them from other angiosperms (Dunn and Campbell, 1965), this identified category might be related to the phenomenon that dicotyledons have more pores. The gene function enrichment analysis results in plant organs, dicotyledons, and non-dicotyledons suggests that our analysis strategy has a certain degree of credibility.

Functional Enrichment Analysis and Comparison Between HMs With Cold and Hot Properties
709 hot-specific and 1,039 cold-specific genes were identified, and the numbers of genes in different groups of HMs matched to reference genes were illustrated by heat map; the red and blue boxes indicate these hot-/cold-specific genes ( Figure 6A). The complete list of cold or hot property-specific genes and the enriched functions was shown in Supplementary Data S4. The gene enrichment analysis results showed that five functional categories were shared among the cold-and hot-propertied HMs, such as protein autophosphorylation (GO:0046777), microtubule-based movement (GO:0007018), etc., meaning that different group-specific genes of these two groups belonged to the same categories that obtained the significant values through the analysis (Figure 6B). In the results, there were also several GO Biological processes and KEGG pathways identified as enriched categories only in the cold HMs or hot MHs. Among the enriched KEGG pathways, flavonoid biosynthesis (ath00941), ABC transporters (ath02010), taurine and hypotaurine metabolism (ath00430), and starch and sucrose metabolism (ath00500) were detected only in the cold HMs, and fatty acid biosynthesis and elongation (M00083) was only in the hot HMs. The searching results of chemical components of HMs and their metabolic pathways (Supplementary Data S5) indicated some links between group-specific chemical ingredients and group-specific enriched KEGG pathways. For example, among cold-specific chemical ingredients, chrysoeriol was shared in cold3 (Radix Scutellariae) and cold9 (Flos Lonicerae); swertiajaponin was shared in cold4 (Herba Catharanthi Rosei) and cold7 (Radix Gentianae); luteolin was shared in cold2 (Herba Dracocephali Tangutici), cold7 (Radix Gentianae), and cold9 (Flos Lonicerae); and paniculatin and wogonin were shared in cold3 (Radix Scutellariae) and cold5 (Herba Andrographis). These five components belonged to flavor biosynthesis (ath900941). In addition, inositol was shared in cold4 (Herba Catharanthi Rosei) and cold9 (Flos Lonicerae), which belong to ABC transporters (ath02010). Similarly, in hot-specific chemical ingredients, lauric acid and octanoic acid were shared in hot3 (Rhizoma Curcumae Longae) and hot6 (Herba Ephedrae); hexadecanoic acid and heptadecane were shared in hot6 (Herba Ephedrae) and hot9 (Radix Notoginseng); and pentadecanoic acid was shared in hot3 (Rhizoma Curcumae Longae), hot6 (Herba Ephedrae) and hot9 (Radix Notoginseng). These five components belonged to fatty acid biosynthesis and elongation (M00083).
The enriched candidate categories were clustered and further formed the networks of functional blocks ( Figure 6C). The clusters with representative terms and the similarity links between the nodes provide a more comprehensive and higher-level visualization for the difference in gene function modules between cold and hot HMs. The terms of clusters belong to independent network blocks that only contained cold-specific genes were embryonic meristem development, embryonic pattern specification, axis specification, regulation of RNA polymerase II transcriptional preinitiation complex assembly, mitochondrial RNA modification, and cell redox homeostasis, while the terms deduced only by hot-specific genes were nucleotide biosynthetic process, peptidy-Lcysteine S-palmitoylation, lipid modification, base-excision repair, dipeptide transport, and response to endoplasmic reticulum stress.

DISCUSSION
Nowadays, theories in cold and hot properties of HMs have become a topic of interest and been researched from different aspects during these years. The relevant research can be divided into two categories based on their research objects. One is to study the physical or clinical responses (thermal behavior, antioxidative activity, etc.) of test subjects (Fu et al., 2015), and the other is to study the chemical materials of herbs themselves (Long et al., 2011). With the rapid development of the next-generation sequencing (NGS), RNA sequencing (RNA-Seq) is widely used in the analysis of transcriptomes of various organisms. For RNA-Seq, improved sequencing throughput and accuracy, shortened sequencing time, and significantly reduced price have opened the door to a better understanding of the sophisticated mechanisms of more species of plants. To date, our research makes the first attempt to study the differences between cold and hot HMs based on their gene expression profiles by taking original material of HMs as research object.
To study the gene function differences between different properties of HMs, cross-species comparison of plants was necessary and informative. Although there has been existing research on comparing gene expression among different species (Kristiansson et al., 2013;Shaar-Moshe et al., 2015), these methods did not seem to be suitable in this study for the less common (non-model species) and distantly related species of HMs. In this study, we established a method for the crossspecies annotation of conserved genes by using transcriptome data of 20 HMs with cold and hot properties. As cross-species gene expression analyses were often hampered by the lack of publicly available genomes and gene orthology information, especially for species that were not common, using the operation procedures of BUSCO, we developed a pipeline for generating species-specific ortholog sets, profiled gene expression by RNAseq, and identified the specific genes grouped by different features (organs, hot/cold properties, and dicotyledons/nondicotyledons).
In order to ensure the reliability of the data sources, we applied strict selection procedures for choosing RNA-seq data included in current study. As HMs are specific parts of plants, which grow in specified areas with specific environment (Huang et al., 2011), all of the datasets were obtained from the exact samples for medicinal purpose. In terms of phylogenetic relationships, the 20 species of samples recruited were scattered within the lineage of plants from BUSCO set ( Figure 4A). As for sequencing platform, all of 20 datasets were generated by Illumina pairedend sequencing with the similar read length. Literature searching and data screening results showed that the research of cold HMs is more popular (Figure 2C), which may be related to its high nephrotoxicity (Liang et al., 2013). Besides, the Illumina sequencing and transcriptome assembly results showed that there was no statistical difference in sequencing throughput, quality (Figures 3C,D), and assembly variables between cold and hot HMs, except for N50 ( Figure 3E). These results suggest that the sequencing and assembly processes were unlikely to cause systemic bias in the follow-up analysis.
As for cross-species gene annotation, the complexity of genes on plant genomes and the incompleteness of gene annotation information makes it almost impossible to compare the expression of all genes across species, as plant genome sizes vary dramatically, ranging from 0.063 to 148.8 Gbp (Greilhuber et al., 2006;Hidalgo et al., 2017); meanwhile, the length of single-copy regions varies widely among plant species (Claros et al., 2012), and the protein-coding gene count is not significantly correlated to genome size (Michael, 2014). As a result, only the conserved genes of the 20 HMs inferred by A. thaliana were annotated and their information was deployed for enrichment analysis in the present study. Besides, relative to sensitivity, accuracy of the annotation was more concerned in this study, which was reflected in the followings. First, only the assembled transcripts with TPM greater than 1 were remained for annotation. Second, the cutoff for BLATSP was strict: 1e-114 for the closest species (cold10) and 1e-16 for the furthest species (hot6). Third, only a small percentage (1.73-5.30%) of unigenes of each HMs and a small percentage (17.54%) of genes on the reference genome matched each other established the links for gene annotation. It should be noted that the numbers of genes for calculation of enrichment parameters were the numbers of genes on reference genome. Thus, the different functional gene modules between groups were the results obtained through the gene function of specific parts of genomic mapped by conserved genes of HMs.
There were three and four types of organs from which cold and hot HMs were sampled, respectively ( Figure 3C). This design was to prevent the difference between cold and hot HMs in the results due to organ specificity. The main purpose of performing enrichment analysis on each type of organs or dicotyledons/nondicotyledons was to assess the effectiveness of the whole computational pipeline. Among the results, expressed genes in leaves were enriched in chloroplast-and chlorophyll-associated functional categories, expressed genes in roots were enriched in toxic and ABA-associated categories, and expressed genes in flower were enriched in microsporogenesis and anthocyaninassociated categories. Meanwhile, regulation of stomatal complex patterning was an enriched category found in dicotyledons.
These findings match what we already know about the specific gene functions of plant organs or dicotyledons (see Results sections "Functional Enrichment Analysis of Specific Organs" and "Functional Enrichment Analysis of Dicotyledons and Nondicotyledons"). Although seed development (GO:0048316) was found in leaves as well as male gamete generation (GO:0048232) in stems, which might be due to the types of organs that are not included the seed and the complex multi-function of genes, the results were still somewhat reasonable under the analysis strategy.
Some of natural products are produced by specific metabolic pathways of plants. These natural products can be served as drugs to modulate molecular networks of humans, which is probably because their corresponding biosynthesis pathways are chemico-biologically associated with the human molecular networks (Zhang B. et al., 2016). In our results, expressed genes in cold HMs were enriched in associated metabolic pathways of flavonoid, taurine, hypotaurine, starch, and sucrose, as well as a pathway for ABC transporters and genes of hot HMs enriched in the pathway of fatty acid biosynthesis and elongation. These were consistent with the metabolic pathways of cold/hot-specific chemical ingredients in plants (see Results sections "Functional Enrichment Analysis and Comparison Between HMs With Cold and Hot Properties"). The results suggested that there might be different patterns of compound metabolism between cold and hot HMs, which might play different roles in affecting the metabolic network of drug users. The Enrichment Network provides a more concise result with regards to the enrichment of expressed gene function ( Figure 6C). For hot HMs, biological processes associated with embryonic development, mitochondrial RNA modification, cell redox homeostasis, and so on were extracted as the hotspecific functional modules. For cold HMs, processes associated with nucleotide biosynthetic and repair, peptidy-L-cysteine S-palmitoylation, lipid modification, dipeptide transport, and response to endoplasmic reticulum stress, were extracted as the cold-specific functional modules. The functional modules and their relationship with the hot/cold property of HMs need to be further verified and interpreted.
Here we describe, to the best of our knowledge, a comprehensive method for the comparison of RNA-seq data among plant species. The novelty should be emphasized. Although previous research has made an effort toward interspecies comparisons, an approach for distant species comparison has not been developed to date. Furthermore, the comparison of RNA-seq data among different species in the present study means that comparisons among huge numbers of species are possible. However, there remains a defect to be discussed. By analyzing suitable transcriptomes of only 10 cold and 10 hot propertied HMs, although the results of specific conserved gene expressing patterns could be obtained, and some could be found reasonable, it was very possible that recruiting more HMs would have impact on the results. Thus, a larger sample size was needed to reach more reliable signals about the difference of functional gene modules between cold and hot HMs.

AUTHOR CONTRIBUTIONS
HW launched ideas and designed the experiments. ArL performed most of the analysis. ArL and HW wrote the manuscript. HW revised the manuscript. AqL visualized the pipeline of cross-species gene annotation and selection of specific expressed genes in Figure 1. JG provided advice and supervised the work. All authors reviewed and approved the final version of the manuscript.

FUNDING
This study was supported by the grant from the Natural Science Foundation of Guangdong Province, China (2018A0303130345 and 2019A1515011255).

ACKNOWLEDGMENTS
We thank Prof. Yuanyan Xiong of Sun Yat-sen University for her helpful discussion.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2020. 00532/full#supplementary-material FIGURE S1 | FastQC results of each sample before and after quality control.       DATA S1 | Cross-species gene expression analysis results.
DATA S2 | Organ-specific genes and enriched GO terms.
DATA S3 | Dicotyledon-specific genes and non-dicotyledon-specific genes and enriched GO terms.
DATA S4 | HMs property-specific genes and enriched categories.
DATA S5 | HMs property-specific compounds and the metabolic pathways.