A Chromosome-Level Genome of the Camphor Tree and the Underlying Genetic and Climatic Factors for Its Top-Geoherbalism

Camphor tree [Cinnamomum camphora (L.) J. Presl], a species in the magnoliid family Lauraceae, is known for its rich volatile oils and is used as a medical cardiotonic and as a scent in many perfumed hygiene products. Here, we present a high-quality chromosome-scale genome of C. camphora with a scaffold N50 of 64.34 Mb and an assembled genome size of 755.41 Mb. Phylogenetic inference revealed that the magnoliids are a sister group to the clade of eudicots and monocots. Comparative genomic analyses identified two rounds of ancient whole-genome duplication (WGD). Tandem duplicated genes exhibited a higher evolutionary rate, a more recent evolutionary history and a more clustered distribution on chromosomes, contributing to the production of secondary metabolites, especially monoterpenes and sesquiterpenes, which are the principal essential oil components. Three-dimensional analyses of the volatile metabolites, gene expression and climate data of samples with the same genotype grown in different locations showed that low temperature and low precipitation during the cold season modulate the expression of genes in the terpenoid biosynthesis pathways, especially TPS genes, which facilitates the accumulation of volatile compounds. Our study lays a theoretical foundation for policy-making regarding the agroforestry applications of camphor tree.


INTRODUCTION
Top-geoherbalism, also known as "Daodi" in China and "Provenance" or "Terroir" in Europe, refers to traditional herbs grown in certain native ranges with better quality and efficacy than those grown elsewhere, in which the relevant characteristics are selected and shaped by thousands of years of the clinical application of traditional medicine (Brinckmann, 2015). The concept of top-geoherbalism is documented in the most ancient and classic Chinese Materia Medica (Divine Husbandman's Classic of Materia Medica; Shen Nong Ben Cao Jing) from approximately 221 B.C. to 220 A.D. (Zhao et al., 2012), which reported that the origin and growing conditions of most herbs were linked to their quality. Historical literature documented cases where the misuse of traditional Chinese medicine (TCM) in prescriptions led to a reduction or absence of therapeutic effects. For example, the application of dried tender shoots of Cinnamomum cassia Presl (a component of Guizhi soup recorded in Prescriptions for Emergencies) from non-top-geoherb regions led to a deficiency in the treatment of fever, while the replacement of this TCM with materials from top-geoherb regions results in proper fever treatment. Currently, TCM from top-geoherb regions accounts for 80% of the market occupancy and economic profits of all TCMs (Huang, 2012), and the significance of top-geoherbs has been revived by the increasing trend of the protection of botanicals with "geographical indication" (GI) (Brinckmann, 2015). Thus, a deep understanding of the pattern and mechanism of topgeoherbalism will strongly guide producers and consumers of TCM and improve the standardization and internalization of the TCM market, especially in the economic context of the Belt and Road Initiative (Hinsley et al., 2020).
A substantial basis of the top-geoherbalism of TCMs is secondary metabolites that play principal roles in the therapeutic effects of herbs. The content of secondary metabolites is a continuous quantitative trait that is determined by three factors: Genotype, environment and the interaction between the genotype and environment. Li Q. et al. (2020) showed that cultivars of opium poppy with similar copy number variations in benzylisoquinoline alkaloid biosynthetic genes were likely to exhibit similar alkaloid contents. To identify the ecological and climatic factors driving the development topgeoherbs, the appropriate strategy is to grow herbs with the same genotype (eliminating the effect of genetic variation) in different environmental settings and study how environmental factors are correlated with secondary metabolites by altering gene expression. However, most published studies confound the effects of genetic variation and environmental factors. For example, Tan et al. (2015) identified metabolite markers for distinguishing Radix Angelica sinensis from top-geoherb regions and nontop-geoherb regions by analyzing the volatile metabolites of processed herbal medicine from drug stores in different regions. The roots of Paeonia veitchii showed higher bioactivities when grown at lower average annual temperatures and high elevations based on the analyses of environmental factors and phytochemicals of different samples from seven populations (Yuan et al., 2020). Liu et al. (2020) revealed a correlation between iridoid accumulation and increased temperature by examining 441 individuals from 45 different origins. The confounding factor of genetic variation led to the misinterpretation of how environmental factors alone affect secondary metabolites. In addition, top-geoherb TCMs have been annotated with their cultural properties, including their cultivation, harvesting and postharvest processing (Huang, 2012), which is outside of the scope of the current study.
Cinnamomum camphora (L.) J. Presl (Figure 1A), also known as camphor tree is native to China, India, Mongolia and Japan (Yoshida et al., 1969;Chen et al., 2017) and was later introduced to Europe and the southern United States (Ravindran et al., 2004;Hamidpour et al., 2012;Bottoni et al., 2021). Camphor trees are divided into five chemotypes according to their dominant volatile oil components as linalool, camphor, eucalyptol, borneol, and isonerolidol types . Their leaves are especially rich in volatile oil components, including monoterpenes, sesquiterpenes and diterpenes (Hou et al., 2020;Tian et al., 2021). Camphor tree has high medicinal, ornamental, ecological and economic value Chen et al., 2020b;Zhang et al., 2020). As a TCM, C. camphora can be used to treat rheumatism and arthralgia, sores and swelling, skin itching, poisonous insect bites, etc. Among the many chemical components of the species, camphor is used as a cardiotonic, deodorant and stabilizer in medicine, daily-use chemical production and industry, and linalool is most frequently used as a scent in 60-80% of perfumed hygiene products and cleaning agents (Letizia et al., 2003;SLI Consulting Inc, 2021).
Despite the economic and evolutionary value of C. camphora, the lack of a high-quality genome for the species has greatly restricted the progress of genetic research and the identification of the biosynthetic genes underlying the production of essential volatile compounds with medicinal effects (Chaw et al., 2019;Chen et al., 2019Chen et al., , 2020dHu et al., 2019;Rendon-Anaya et al., 2019;Lv et al., 2020;Shang et al., 2020). In our study, we de novo assembled the chromosome-scale genome of C. camphora, FIGURE 1 | The camphor tree and landscape of its genome. (A) Images of a camphor tree and its multiple tissues. (i) An ancient camphor tree, (ii) tissue-cultured seedings, (iii) seedings, (iv) young leaves, (v) flowers and (vi) fruits. (B) Circos plot of the C. camphora genome assembly. Circles from outside to inside: (i) chromosomes, (ii) Gypsy LTR density, (iii) Copia LTR density, (iii) total LTR density, (iv) gene density and (v) GC content. These density metrics were calculated with 1 Mb non-overlapped sliding windows. The syntenic genomic blocks ( > 300 kb) are illustrated with orange lines. explored the genomic characteristics of C. camphora and investigated the genetic and climatic factors underlying the topgeoherbalism of this well-known TCM.

Cinnamomum camphora Genome Assembly and Annotation
A haplotype-resolved genome assembly was obtained by using Hifiasm (Cheng et al., 2021) integrating HiFi long reads (24.95 Gb; ∼32.83X based on a previous flow cytometry-based estimated genome size of 760 Mb (Wu et al., 2014)) and Hi-C short reads (63.78 Gb; ∼83.92X) with the default parameters. The two contiglevel haplotype genomes showed total sizes of 768.97 and 752.05 Mb, respectively, after removing redundant sequences with Purge_haplotigs (Roach et al., 2018). Compared with the earlier published C. kanehirae genome (Chaw et al., 2019), the contig N50 was 2.01 Mb for haplotype A and 2.25 Mb for haplotype B; these values are ∼2.2-and ∼2.5-fold higher, respectively, than the values of the previously published C. kanehirae genome (Chaw et al., 2019), respectively ( Table 1). Benchmarking Universal Single-Copy Ortholog (BUSCO) analyses showed that the contig sets of haplotype A and haplotype B contained 94.1 and 96.0% complete sets of the core orthologous genes of viridiplantae, respectively (Supplementary Table 2). Subsequently, the two non-redundant contig sets were anchored to 12 chromosomes based on Hi-C contacts (Supplementary Figure 1). Overall, the assembled genome size was consistent with previous reports (Table 1 and Supplementary Table 3). As the chromosomelevel haplotype A genome was much closer to the estimated genome size than the haplotype B genome, it was used for subsequent analyses.
The final assembled C. camphora genome consisted of 12 chromosomes, 1,025 scaffolds and 1,686 contigs, with a scaffold N50 of 64.34 Mb (Table 1 and Figure 1B). The number of assembled chromosomes was consistent with previous cytological observations. 1 The length of the chromosomes ranged from 84.54 (Chr1) to 36.36 Mb (Chr12) (Supplementary Table 5). The high fidelity of the assembly was corroborated by multiple lines of evidence. First, the mapping rates of RNA-Seq pairedend reads against the assembled genome were high (93-95%). Second, the high completeness of this assembly was supported by a 96.2% BUSCO value ( Table 1 and Supplementary  Table 4), suggesting high completeness at the gene level, which was much better than the completeness of the C. kanehirae genome assembled via PacBio CLR sequencing. Third, the long terminal repeat (LTR) assembly index (LAI) score (Ou and Jiang, 2018) was 13.82, indicating the "reference" level of the genome and reflecting completeness at the transposable element (TE) level.
By combining ab initio prediction, orthologous protein and transcriptomic data, we annotated 24,883 protein-coding genes using the MAKER pipeline (Cantarel et al., 2008; Table 1). The average lengths of the gene regions, coding sequences (CDSs), exon sequences, and intron sequences  Table 6). Long-terminal repeat retrotransposons (LTR-RTs) were the most abundant type of repetitive sequence, accounting for 46.86% of the whole genome, similar to the proportion of LTRs found in C. kanehirae (Table 1). A total of 3,731 intact LTR-RTs were identified, and the frequency distribution of insertion times showed a burst of LTR-RTs 1-2 million years ago (Mya) (Supplementary Figure 2), consistent with previous reports in magnoliids. Gypsy elements were the largest LTR-RT superfamily in C. camphora, constituting 15.42% of the genome. The second largest superfamily was Copia, accounting for 7.69% of the genome. Other unclassified LTR-RTs encompassed 4.83% of the genome. Among DNA retrotransposons, terminal inverted repeat sequences (TIRs) and non-TIRs comprised 14.05 and 5.87% of the genome, respectively.

Phylogenetic and Whole-Genome Duplication Analyses
The concatenate phylogenetic tree constructed with 172 single-copy orthologous genes among 16 species showed that C. camphora was clustered with two other Lauraceae species and that the formed clade was then clustered with other magnoliids (Figure 2 and Supplementary Figure 3A). More importantly, the magnoliids were sister to the combined clade of eudicots and monocots rather than to either monocots or eudicots. In addition, the coalescence-based phylogenetic tree showed the same topology for magnoliids as the concatenation tree (Supplementary Figure 3B). This phylogenetic topology is consistent with some previous studies (Chen et al., 2019(Chen et al., , 2020aHu et al., 2019;Rendon-Anaya et al., 2019) but contrasts with other reports (Chaw et al., 2019;Lv et al., 2020;Shang et al., 2020;Qin et al., 2021). The divergence time between magnoliids and the clade of monocots and eudicots was inferred to be 147.47-170.11 Mya (Figure 2) in MCMCTree with fossil calibration (Yang, 2007), coinciding with the estimates in other studies (Chen et al., 2019(Chen et al., , 2020cHu et al., 2019;Rendon-Anaya et al., 2019). In the clade of magnoliids, the divergence time between Laurales and Magnoliales was approximately 138.96 Mya, after which the divergence of Lauraceae and Calycanthaceae occurred approximately 114.26 Mya, and finally, C. camphora diverged from the most recent common ancestor of C. camphora and C. kanehirae approximately 28.03 Mya.
The intragenomic collinearity and syntenic depth ratio of C. camphora showed clear syntenic evidence of ancient WGD events ( Figure 1B and Supplementary Figure 4). Only the paralogous gene set derived from duplicates produced by WGD was used for the analyses of synonymous substitutions per site (Ks) (see "Materials and Methods" section). Two ancient WGD events shared within the Laurales lineage (C. camphora, C. kanehirae and Persea americana) occurred approximately 85.66 Mya and 137.90 Mya, represented by two signature peaks with Ks values of approximately 0.52 and 0.83 ( Figure 3A). The recent Ks peak (85.66 Mya) occurred much earlier than the divergence time (42.27 Mya) between P. americana and the common ancestor of Cinnamomum, indicating that this round of WGD was shared between Cinnamomum and Persea (Figures 2, 3A). Additionally, the Ks peak (137.90 Mya) occurred earlier than the divergence (114.26 Mya) between Lauraceae species and Chimonanthus salicifolius, implying that this round of WGD was shared between Lauraceae and Calycanthaceae (Figures 2, 3A). We also detected a WGD event in Liriodendron chinense that occurred approximately 116.67 Mya, corresponding to a Ks peak of 0.70 ( Figure 3A), which is consistent with a previous study (Chen et al., 2019).
To confirm that C. camphora underwent two rounds of ancient WGD, we performed intergenomic synteny analyses between the genomes of C. camphora and L. chinense together FIGURE 2 | Phylogenetic analyses. The phylogenetic tree was constructed based on 172 single-copy orthologous genes of 16 species using two ANA-grade species as outgroups; node age and 95% confidence intervals are labeled. Pie charts show the proportions of gene families that underwent expansion or contraction. Predicted whole-genome duplication (WGD) events were only indicated for Laurales and Magnoliales.
To determine the functional roles of the genes retained after WGD events, GO and KEGG enrichment analyses were performed. The results showed that the genes retained after WGD were enriched in basic physiological activities, processes and pathways, such as structural constituents of ribosomes, NADPH dehydrogenase activity, photosynthesis and plant hormone signal transduction, suggesting that two rounds of WGD events enhanced the adaptability of C. camphora to changing environments by improving basic physiological activities and primary metabolism (Supplementary Figures 6, 7 and Supplementary Tables 7, 12).

Tandem and Proximal Duplications Contribute to Terpene Synthesis in Cinnamomum camphora
A total of 18,938 duplicated genes were identified, including 7,043 WGD genes (37.19%), 1,597 tandem duplication (TD) genes (8.43%), 1,079 proximal duplication (PD) genes (5.70%), 5,127 dispersed duplication (DSD) genes (27.07%) and 3,779 transposed duplication (TRD) genes (19.95%) (Supplementary Figure 8). The Ks and Ka/Ks values among different types of duplicated genes were calculated (Figures 3C,D). The Ks values of TD and PD genes were much smaller than those of other types of duplications (Figure 3D), suggesting that TD and PD genes were formed recently. Additionally, the higher Ka/Ks ratios of TD and PD genes ( Figure 3C) implies that they were subject to more rapid sequence divergence.
We also performed GO and KEGG enrichment analyses of the gene sets associated with the five duplication types (Supplementary Figures 6,7 and Supplementary Tables 7-16). The GO results showed that the biological process (BP) and molecular function (MF) categories of secondary metabolite, monoterpene, sesquiterpene and terpene biosynthesis and metabolism were significantly enriched in the TD and PD gene sets, but no enrichment of these GO terms was found in the WGD, DSD and TRD gene sets ( Figure 3E). Regarding the KEGG results, the TD and PD gene sets were significantly enriched in the terpenoid backbone, monoterpenoid, sesquiterpenoid and phenylpropanoid biosynthesis pathways. These secondary metabolite biosynthesis pathways were not enriched in the other three duplication gene sets ( Figure 3F). In addition, KEGG pathways related to responses to environmental stimuli, such as the MAPK signaling pathway, steroid hormone biosynthesis and plant-pathogen interaction, were also significantly enriched in the TD and PD gene sets. All of these results indicated that the rapidly evolving TD and PD genes played essential roles in the synthesis of secondary metabolites, especially monoterpenes and sesquiterpenes, and the response to environmental stimuli in C. camphora.

Metabolic Reflection of Top-Geoherbalism
To evaluate how environmental factors affect metabolite accumulation, we grew tissue-cultured seedlings from a single mother plant in four major production locations, including Qinzhou, Nanning, Baise and Liuzhou, beginning in May 2018. Mature leaves were harvested from the four locations in November 2020 for RNA-Seq and metabolite assessment. Thus, we were able to compare the effects of environmental conditions on metabolite accumulation based on the fixed genotype.
The volatile metabolites of the samples from the four locations exhibited distinct clustering in the PCA plot ( Figure 4C). Notably, the metabolites of samples from Baise and Liuzhou were clearly separated from those of Qinzhou and Nanning according to principal component 1 (PC1), which accounted for 43.2% of the total variance. The heatmap based on the relative abundances of volatile metabolites showed that samples from Baise and Liuzhou exhibited higher abundances than those from Qinzhou and Nanning, especially with regard to terpenes ( Figure 4B and Supplementary Figure 10).
Linalool, borneol, camphor, 1,8-cineole and isonerolidol are the five main phytochemicals determining the medicinal value and top-geoherbalism of C. camphora. A detailed comparison of the five volatile terpenoids implied that samples from Liuzhou showed the highest total abundance, followed by the samples from Baise, while the abundance was lowest in Qinzhou ( Figure 4D). The abundance profiling of camphor and borneol in the four locations echoed the trends of the total abundance of the five volatile terpenoids. The abundance of linalool and isonerolidol peaked in Baise and Liuzhou, respectively, while that of 1,8-cineole peaked in Nanning. Taken together, these results suggested that Baise and Liuzhou are top-geoherb regions of the camphor tree in Guangxi Province.

Modulation of Gene Expression Related to Top-Geoherbalism
TPSs are the rate-limiting enzymes in the production of terpenoids (Chen et al., 2011;Tholl, 2015), including monoterpenes, sesquiterpenes and diterpenoids (see "Materials and Methods" section). The identified TPS genes of C. camphora, C. kanehirae, P. americana, L. chinense and Arabidopsis thaliana were clustered into six clades in the phylogenetic tree, corresponding to the TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g subfamilies (Chen et al., 2011), among which TPS-b and TPS-g encode the enzymes catalyzing the production of 10-carbon monoterpenoids from geranyl diphosphate (GPP), and TPS-a genes are responsible for catalyzing the production of 15-carbon sesquiterpenoids from farnesyl diphosphate (FPP) (Figures 5A,B). The copy number variation of the TPS genes showed that TPS-b was greatly expanded in C. kanehirae and C. camphora (Figures 5A,B), which resulted in the high abundance of monoterpenoids in Cinnamomum (Chen et al., 2020d). Specifically, 44 TPS-b genes were identified in C. camphora, accounting for 61% of all its TPS genes, consistent with the percentage observed in C. kanehirae (63%) and much higher than those in L. chinense (33%), A. thaliana (18%) and P. americana (12%). We also observed more copies of the TPS-g subfamily in C. camphora than in A. thaliana and P. americana. However, no expansion of TPS-a genes was detected in Cinnamomum.
Next, we examined how the TPS genes were distributed across the genome. Chromosome 9 and chromosome 7 harbored the most TPS genes (27 and 25, respectively), followed by chromosome 3 (13 TPS genes) ( Figure 5C and Supplementary  Table 18). Interestingly, genes from the seven subfamilies were observed as tandem duplicates. Two large TPS-b gene clusters were identified on chromosome 9 (21 genes; ca. 38.78-40.12 Mb) and chromosome 7 (10 genes; ca. 12.70-16.03 Mb), respectively. In addition, two large TPS-a gene clusters were detected on chromosome 3 (10 genes; ca. 31.90-32.52 Mb) and chromosome 7 (6 genes; ca. 19.49-19.90 Mb), respectively. The remaining smaller TPS gene clusters were scattered throughout the genome of C. camphora.
Notably, the TPS genes exhibited a strong tissue-specific expression pattern (Figure 5D and Supplementary Table 19), especially TPS-a, TPS-b and TPS-g. We downloaded previously published RNA-Seq data 2 from three tissues (leaf, stem and root) of C. camphora to determine the gene expression profile of TPS genes. As leaves are the major tissue used in medicine, we focused on the TPS genes with high expression levels in leaves. Six TPSa genes and 12 TPS-b genes showed higher expression in leaves than in stems and roots. All of these results indicated that the expansion of the TPS-b subfamily and the tandem duplication of TPS-b genes probably contribute to monoterpenoid biosynthesis in C. camphora.
The biosynthetic pathways of terpenoids are derived from isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP) produced via the methylerythritol phosphate (MEP) and mevalonate (MVA) pathways, respectively Zhou and Pichersky, 2020). Comparative transcriptome analyses of samples from the four locations were performed to examine the expression patterns of genes involved in the MEP and MVA pathways. We combined four differentially expressed gene (DEG) sets identified in Liuzhou vs. Qinzhou, Liuzhou vs. Nanning,  Figure 11). We detected 22 and 23 DEGs in the MEP and MVA pathways, respectively (Figure 6 and Supplementary Table 20). Generally, the DEGs involved in all the steps of the MEP and MVA pathways showed higher expression in Baise and Liuzhou than in Qinzhou and Nanning, except for the 2-C-methyl-Derythritol-2,4-cyclodiphosphate synthase (MDS) gene, which was absent in the combined DEG set. The same expression pattern was observed for the genes related to downstream steps, including isopentenyl diphosphate isomerase (IDI), geranyl diphosphate

Climatic Factors Underlying the Top-Geoherbalism of Cinnamomum camphora
To determine what climatic factors caused the differences in volatile metabolites among the C. camphora plants of the same genotype grown in different locations, we downloaded the climate data of Qinzhou, Nanning, Baise and Liuzhou from 3 years (2018-2020) from the National Meteorological Information Centre. 3 The 17 climatic factors could be classified into temperature-related, wind-related, pressure-related, precipitation-related, humidity-related and sunshine-related factors (Supplementary Tables 21, 22).
PCAs of the climatic factors showed that PC1 (accounting for 53.2% of the total variance) separated Liuzhou from Qinzhou, Nanning and Baise, while PC2 (accounting for 25.8% of the total variance) split Baise from Qinzhou, Nanning and Liuzhou ( Figure 7A). To examine the contributions of climatic factors to PC1 and PC2, we loaded them in the PCA plot ( Figure 7B). Temperature-related factors were the main variables contributing to PC1, including mean minimum temperature (16.88%), mean temperature (16.32%) and mean maximum temperature (14.50%) (Figure 7C), and precipitation-related factors mainly contributed to PC2, including daily precipitation (27.29%) and maximum daily precipitation (19.75%) (Figure 7D).
To examine the detailed differences in temperature-and precipitation-related factors in these four locations, we plotted the monthly observations of mean minimum temperature, mean temperature, mean maximum temperature, daily precipitation and maximum daily precipitation (main contributors to PC1 and PC2) from 2018 to 2020 (Figures 7E,F, Supplementary  Figure 12, and Supplementary Table 22). The mean temperature of Liuzhou was much lower than those of the other locations, especially in the two colder quarters of the year. Additionally, daily precipitation was lower in Baise than in the other locations, especially in early spring. All of these results suggested that relatively low temperature and precipitation during the cold season imposes some degree of stress on the plants and thus stimulated the production of the desired terpenoids in C. camphora.

DISCUSSION
Here, by decoding the genome of the camphor tree, we revealed that magnoliids are a sister group to the clade of eudicots and monocots. Chaw et al. (2019) deciphered the first Cinnamomum genome and found that magnoliids is a sister group to eudicot. However, their study was only included one magnoliid species to generate single-copy orthologs and construct the phylogenetic tree. Our sample cohort contains five magnoliids (more magnoliids genome sequences have been published since 2019), and the single-copy orthologs identified are more valid and robust for magnoliids. It is worth mentioning that Qin et al. (2021) found the placement of magnoliids as sister to monocots based on the ancient genomic rearrangements. Taken together, we think the phylogenetic placement of magnoliids is still inconsistent and awaits further investigation. The two rounds of WGD identified in C. camphora were dated to ca. 85.66 and 137.90 Mya, respectively. The former was shared with Lauraceae species, and the latter occurred before the divergence of Lauraceae and Calycanthaceae. We found that rapidly evolving TD genes play key roles in the synthesis of secondary metabolites, especially for monoterpenes and sesquiterpenes. By analyzing volatile metabolites in leaves sampled from plants of the same genotype grown in four different locations, we found higher accumulation of the key volatile metabolites in regions with lower temperature and precipitation in the cold season, which was attributed to the differential expression of genes related to the MVA and MEP pathways as well as their downstream steps in terpene biosynthesis. Our study confirmed that abiotic stress contributes to the development of top-geoherbs, laying a theoretical foundation for policy making on the agroforestry applications of camphor trees.
In addition to releasing the first chromosome-level genome of this significant economic tree species, our study is innovative in two regards. First, it is among the very few innovative threedimensional evaluations of the top-geoherbalism of a TCM species, integrating phytochemical, genetic and climatic analyses of camphor tree germplasm . Second, our study is the first to fix the genotypic variation of the germplasm so that it is feasible to investigate how climatic factors alone affect the accumulation of metabolites Mandim et al., 2021). By integrating these two novel approaches, our study provides an example of how to comprehensively scrutinize the genetic and climatic factors affecting the composition of secondary metabolites.
The dominant climatic factors affecting the secondary metabolites of the camphor tree are not always the key factors affecting in other traditional herbs. For example, humidity and sunshine time are the chief limiting factors in the production of artemisinin in Artemisia annua L. . Additionally, lower temperature in the coldest quarter imposes stress on camphor tree and increases the production of desired volatile compounds. In other cases, a higher temperature imposes stress and enhances the production of desired metabolites . The limiting ecological and climatic factors depended on the native ranges of the herbs and climatic conditions during their growing season (Korner, 2021). A thorough investigation of each specific case would be beneficial for decision making.
Some caveats need to be considered when interpreting our results. First, broader planting of the same genotype (i.e., covering all provinces in South China and even some Southeast Asian countries) would provide a thorough understanding of the optimal climatic variables resulting in the greatest consistency and efficacy of the medicinal components of camphor tree (Sharma et al., 2020). Second, the further experimental verification of the effect of temperature on metabolite contents and the functional validation of selected candidate genes (such as TPS-b, TPS-g, of TPS-a) of the biosynthetic pathway under controlled laboratory conditions would enhance our conclusions (Lau and Sattely, 2015;Nett et al., 2020). Third, soil characteristics (including nutritional status, humidity and rhizosphere microorganisms, etc.) are crucial for the accumulation of secondary metabolites (Ciancio et al., 2019;de Vries et al., 2020). However, owing to the absence of appropriate data for assessing these characteristics, we overlooked ecological factors such as soil conditions. Further studies addressing the abovementioned limitations would provide an in-depth framework for understanding the genetic and environmental factors related to top-geoherbalism.
Taken together, our results lay a theoretical foundation for the optimal production of this economically significant tree species. The distributional range of camphor tree largely overlaps with the land and maritime portions of the silk road of the Belt and Road Initiative (Hinsley et al., 2020); thus, products obtained from camphor tree, such as TCMs or essential oil, show a high chance of being exported to connected countries in the Middle West, Western Europe and even North Africa. As tissue culture methods for camphor trees are well established and the growth rate of the trees is relatively fast (Shi et al., 2010), the commercial cultivation of trees ex situ is probably a sustainable and economic method in addition to importation (Brinckmann, 2015). Thus, our research will provide guidance for policy making, genotype selection and the optimization of climatic conditions for growth by domestic and international government stakeholders, farmers, merchandisers and consumers.

Plant Materials
The sequences utilized for de novo genome assembly of the genome were obtained from the fresh leaves of a single camphor tree (C.

DNA Sequencing
High molecular weight (HMW) genomic DNA was extracted by using a DNeasy Plant Mini Kit (Qiagen, United States), and 50 µg of the HMW DNA was used to generate SMRTbell TM libraries. The circular consensus sequencing (CCS) data were then produced on the PacBio Sequel II platform. Hi-C libraries were constructed from the tender sprouts of C. camphora with fragments labeled with biotin, and then sequenced based on the Illumina NovaSeq 6000 platform.

RNA Sequencing
Total RNA was extracted from each sample by using an RNAprep Pure Plant kit (TIANGEN, Beijing, China). cDNA was synthesized from 20 µg of total RNA using Rever Tra Ace (TOYOBO, Osaka, Japan) with oligo (dT) primers following the user manual. RNA sequencing was also performed on the Illumina NovaSeq 6000 platform.

De novo Genome Assembly and Quality Assessment
The C. camphora genome was assembled by integrating the sequencing data obtained with PacBio CCS and the Hi-C technology using Hifiasm (Cheng et al., 2021). We assembled two contig-level genomes, including a monoploid genome (haplotype A) and an allele-defined genome (haplotype B). Before Hi-C scaffolding, we filtered the redundant contigs from the contiglevel genomes by using Purge_haplotigs (Roach et al., 2018). The Hi-C reads were assessed with the HiC-Pro program (Servant et al., 2015). Juicer tools (Durand et al., 2016) and 3D-DNA pipelines (Dudchenko et al., 2017) were used to perform chromosome scaffolding. The BUSCO (Seppey et al., 2019) method was used to evaluate the completeness of the chromosome-level genomes.

Repetitive Sequences and Gene Annotation
We used the EDTA pipeline (Ou et al., 2019) to identify transposable elements in the C. camphora genome, including LTR, TIR and non-TIR elements. LAI assessment and LTR insertion time estimation were performed by LTR_retriever (Ou and Jiang, 2018) with the synonymous substitution rate set as 3.02e-9 (Cui et al., 2006). TRF software (Benson, 1999) with default parameters was applied to annotate tandem repeats.
For gene model annotation, we trained ab initio gene predictors, including AUGUSTUS (Stanke and Morgenstern, 2005) and SNAP (Korf, 2004), on the repeat-masked genome using a combination of protein and transcript data. We used the annotated proteome data of A. thaliana, L. chinense (Chen et al., 2019), C. kanehirae (Chaw et al., 2019) and P. americana (Rendon-Anaya et al., 2019) and the Swiss-Prot database as the protein data. We employed RNA-Seq data from the four locations as the transcript data. To train AUGUSTUS, BRAKER2 (Bruna et al., 2021) was applied with the transcript data from aligned RNA-Seq bam files produced by Hisat2 (Kim et al., 2015). SNAP was trained under MAKER (Cantarel et al., 2008) with two iterations. Transcript data were supplied in the form of a de novo assembled transcriptome generated in Trinity (Haas et al., 2013) and a reference-based assembly generated by StringTie (Pertea et al., 2016). After training, the AUGUSTUS and SNAP results were fed into MAKER again along with all other data to produced synthesized gene models.
Functional annotations of the protein-coding sequences were obtained via BLASTP ("-e-value 1e-10") searches against entries in both the NR and Swiss-Prot databases. The prediction of gene sequence motifs and structural domains was performed using InterProScan (Jones et al., 2014). The annotations of the GO terms and KEGG pathways of the genes were obtained from eggNOG-mapper (Huerta-Cepas et al., 2017).

Phylogenetic Analyses and Estimation of Divergent Times
A total of 16 plant species, including five magnoliids (C. camphora, C. kanehirae, P. americana, Ch. Salicifolius, and L. chinense), four monocots (Zea mays, Oryza sativa, Ananas comosus, and Musa acuminata), five eudicots (A. thaliana, Populus trichocarpa, Medicago tuncatula, Solanum lycopersicum, and V. vinifera) and two ANA-grade species (Nymphaea colorata and Amborella trichopoda) were used to infer the phylogenetic tree. All genomes except for that of Ch. salicifolius 4 were downloaded from JGI 5 and Ensembl Plants. 6 Paralogs and orthologs were identified among the 16 species by using the OrthoFinder pipeline (Emms and Kelly, 2019) with the default parameters, and the protein sequences of the 172 identified single-copy orthologous genes were used for the construction of the phylogenetic tree. The concatenated amino acid sequences were aligned with MAFFT (Katoh and Standley, 2013) and trimmed with trimAI (Capella-Gutierrez et al., 2009). A maximum likelihood (ML) phylogenetic tree was constructed using IQ-TREE (Nguyen et al., 2015) with ultrafast bootstrapping (-bb 1,000), using N. colorata and A. trichopoda as outgroups. The best-fitting substitution models were selected by ModelFinder (Kalyaanamoorthy et al., 2017). In addition, ASTRAL-III v5.7.3 (Zhang et al., 2018) was applied to infer the coalescence-based species tree with 172 gene trees. The species tree was then used as an input to estimate the divergence time in the MCMCTree program in the PAML package (Yang, 2007). The calibration time was obtained from TimeTree (Kumar et al., 2017): (1)

Genome Duplication and Syntenic Analyses
To identify the pattern of genome-wide duplications in C. camphora, we divided duplicated genes into five categories, WGD, TD, PD (duplicated genes separated by less than 10 genes on the same chromosome), TRD, and DSD (the remaining duplicates other than the four specified types) gene, using DupGen_Finder (Qiao et al., 2019) with the default parameters. The Ka, Ks, and Ka/Ks values were estimated for duplicated gene pairs based on the YN model in KaKs_Calculator2 , followed by the conversion of amino acid alignments into the corresponding codon alignments with PAL2NAL (Suyama et al., 2006). The genes in the five duplicate categories were further subjected to GO and KEGG analyses with the R package clusterProfiler (Yu et al., 2012). The enriched items were selected according to an FDR criterion of 0.05. The dating of ancient WGDs and ortholog divergence were estimated based on the formula T = Ks/2r, where Ks refers to the synonymous substitutions per site, and r (3.02e-9) is the synonymous substitution rate for magnoliids (Cui et al., 2006).
Genomic synteny blocks to be employed for intra-and interspecies comparisons among magnoliids were identified with MCscan software (Tang et al., 2008). We performed allagainst-all LAST analyses (Kielbasa et al., 2011) and chained the LAST hits according to a distance cut-off of 10 genes, requiring at least five gene pairs per synteny block. The syntenic "depth" function implemented in MCscan was applied to estimate the duplication histories of the respective genomes. Genomic synteny was visualized with the Python version of MCscan, the R package RIdeogram (Hao et al., 2020) and Circos (Krzywinski et al., 2009).

Gene Expression Profiling
The raw RNA-Seq data were filtered by using FASTP (Chen et al., 2018). The clean data were aligned to our assembled C. camphora genome with Hisat2 (Kim et al., 2015), and the quantification of gene expression was calculated with StringTie (Pertea et al., 2016). The Python script preDE.py built into StringTie was used to convert the quantification results into a count matrix. DEGs were detected with the DESeq2 R package (Love et al., 2014) with an FDR < 0.05.

TPS Gene Family
TPS genes are classified into seven subfamilies, including TPSa, TPS-b, TPS-c, TPS-d, TPS-e, TPS-f, and TPS-g (Chen et al., 2011). The TPS genes included in the TPS-a subfamily are responsible for forming 15-carbon sesquiterpenoids. The TPSb and TPS-g superfamilies encode the enzymes producing 10-carbon monoterpenoids. The TPS-c, TPS-e, and TPS-f subfamilies encode diterpene synthases, which catalyze the formation of 20-carbon isoprenoids. The TPS-d subfamily is gymnosperm specific and encodes enzymes involved in the production of 20-carbon isoprenoids (Martin et al., 2004). The TPS genes were predicted based on both their conserved domains (PF01397 and PF03936) and BLAST analyses. Conserved domains were used as search queries against the predicted proteome using hmmsearch in HMMER. 7 TPS protein sequences from A. thaliana and C. kanehirae were used as queries to identify the TPS genes of C. camphora, P. americana, L. chinense, and V. vinifera. The protein sequence hits of TPS genes were aligned with MAFFT (Katoh and Standley, 2013) and trimmed with trimAI (Capella-Gutierrez et al., 2009). The TPS gene tree was constructed using RAxML (Stamatakis, 2014) with 1,000 bootstrap replicates. The TPSc subfamily was designated as the outgroup. The distribution of TPS genes on the chromosomes was visualized in TBtools (Chen et al., 2020a).

Determination of Volatile Metabolites
Six biological replicates were sampled in each location. The samples were ground into powder in liquid nitrogen. Two grams of the powder was transferred to a 20 ml headspace vial. The vials were sealed using crimp-top caps with TFEsilicone headspace septa. In solid-phase microextraction (SPME) analyses, each vial was placed at 60 • C for 12 min, and a 65 µm divinylbenzene/carboxene/polydimethylsilioxane fiber (Supelco, Bellefonte, PA, United States) was then exposed to the headspace of the sample for 30 min at 60 • C. The desorption of the volatile metabolites from the fiber coating was carried out in the injection port of the GC apparatus at 250 • C for 10 min in spitless mode. The identification and quantification of volatile metabolites were carried out with a 30 m x 0.25 mm × 1.0 µm DB-5MS (5% phenyl-polymethylsiloxane) capillary column based on the Agilent 7890B-7000D platform. Helium was used as the carrier gas at a velocity of 1.0 ml/min. The temperature of oven was programmed to increase from 40 at 5 • C/min until it arrived at 280 • C. The quadrupole mass detector, ion source and transfer line temperatures were set at 150, 230, and 280 • C, respectively.

Climatic Factor Collection and Analyses
Data on 17 climatic factors from Qinzhou, Nanning, Baise and Liuzhou from 2018 to 2020 were downloaded from the National Meteorological Information Centre (see footnote 3). The analyses of the climatic factors were performed in the R package FactoMineR (Lê et al., 2008).

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
LW, CL, KL, and RJ conceived and designed the study. RJ, XC, CZ, PW, and KL prepared the materials. CL, XC, XL, DP, and XH performed data analyses. RJ, XC, DH, CL, and LW wrote the manuscript. All authors read and approved the final draft.