Study on Quality Response to Environmental Factors and Geographical Traceability of Wild Gentiana rigescens Franch

Gentiana rigescens Franch. ex Hemsl. is an important medicinal plant in China and the over exploitation of wild resources has affected its quality and clinical efficacy. The accumulation of plant secondary metabolites is not only determined by their genetic characteristics but also influenced by environmental factors. At present, many studies on evaluating the environmental conditions of its planting area are still in the qualitative stage. Therefore, it is necessary to establish a systematic evaluation method to deeply analyze the impact of environmental factors on the quality of medicinal materials and quickly verify the geographical origin. In this study, the contents of five iridoids (loganic acid, swertiamarin, sweroside, gentiopicroside and 6'-O-β-D-glucopyranosylgentiopicroside) of G. rigescens from 45 different origins (including 441 individuals) of Yunnan Province in China were analyzed by high performance liquid chromatography. Analytical procedures of one-way analysis of variance, correlation analysis, principal components analysis, and hierarchical cluster analysis were employed to interpret the correlation of iridoid content and environmental factors. Fourier transform infrared spectroscopy (FT-IR) combined with two multivariate analysis methods (partial least squares discriminant analysis; support vector machines, SVM) was used to discriminate four major producing areas (158 individuals). The combination of SVM with grid search algorithm achieved an accuracy of 100% in the test set. One-way analysis of variance showed that the contents of five iridoids in root tissues of G. rigescens varied significantly among different origins, which was also verified by the chemometrics analysis results of hierarchical cluster analysis. The results of correlation analysis indicated that the high value of altitude and precipitation were unfavorable for the accumulation of these five iridoids. A correlation between increase of temperature and iridoid accumulation was observed. This study provided a certain theoretical basis for the resource protection and development of G. rigescens based on the correlation analysis between the ecological environment factors and quality.


INTRODUCTION
Traditional Chinese medicine (TCM) known as Radix Gentianae, with root and rhizome being the major parts used medicinally. The genuine producing area, which refers to the terminology of daodi in Chinese, of Gentiana rigescens Franch is located in Yunnan Province of China, and medicines produced in this place have higher quality and superior clinical effects than the same species originated from other regions (Zhao et al., 2012). The iridoids gentiopicroside (GE), swertiamarin (ST), and sweroside (SW) are the main active ingredients of G. rigescens. G. rigescens has liver-protecting , cholagogic (Lim et al., 2007), anti-inflammatory (Öztürk et al., 2006;Wang et al., 2013), and anti-oxidant (Jaishree and Badami, 2010) activities, which has long been used in treating inflammation, damp-heat, jaundice, eczema, and itching (National Pharmacopoeia Committee, 2015;Chu et al., 2016). In recent years, the root of Radix Gentiana is found to contain the 2,3-dihydroxybenzoates (gentisides A-B) with favorable neurotrophic and protective effects, making it a potential material as the functional food against Alzheimer's disease (AD) (Gao et al., 2010;Mustafa et al., 2016). As an important natural medicine, G. rigescens is used as a raw material of Chinese drug such as "Long Dan Xie Gan Tang," but also in clinical medicine. Hence, G. rigescens is a kind of plants with remarkable physiological activity, which has shown great value in the development and utilization of medicinal plant resources.
With the deepening of study in recent years, the annual demand for G. rigescens in clinical and health care product industries is up to 40-60 tons , and the related quality standard is also improved. As a result, the wild resources can no longer meet the increasing demand for G. rigescens, which may cause resource depletion problems . At present, progresses have been made in the artificial cultivation technology to protect the original plants from extinction due to the growing demands and excessive harvesting (Tang et al., 2013). However, the blind introduction without considering the ecological suitability also gives rise to a series of problems, such as the uneven quality of medicinal materials produced across different origins and the seriously excessive heavy metals in medicinal materials. Therefore, it is necessary to further examine the quality and trace the geographical origin of wild G. rigescens. Vibration spectroscopy technology, including the Fourier transform infrared spectroscopy (FT-IR), is a convenient quality evaluation approach, which has been widely adopted due to its advantages of rapidness and non-invasiveness (Ma et al., 2018). Moreover, the obtained spectra reflect all sample chemical information compared with that of high performance liquid chromatograph (HPLC) data. Generally, multivariate analysis is utilized to mine the spectral information for better predict the real geographical origin. Besides, the linear partial least squares discriminant analysis (PLS-DA) and nonlinear support vector machine (SVM) have been extensively used.
The active ingredients of medicinal plants are important indexes to evaluate the quality of Chinese medicinal materials . For a majority of plants, the accumulation of secondary metabolites is determined not only by their own genetic characteristics, but also by the ecological environmental factors such as light, water, temperature, and soil moisture (Kawabata et al., 2001;Wu et al., 2014). Plants will determine the species and quantity of secondary metabolites synthesized based on the changes in their own environment (Linhart and Grant, 1996). This can be ascribed to the combined consequence of long-term adaptation and selection to complex environments in plants during the process of evolution, and the environmental choices for plants (Ackerly et al., 2000). As the main producing area of G. rigescens, Yunnan Province has a special geographical environment of low-latitude plateau, and there are various climate types in the distribution area, from north subtropical to south subtropical and marginal tropics (He and Liu, 2001;Zheng et al., 2015). At present, many studies have focused on exploring the differences and similarities of medicinal materials collected from various climate zones, which ignore the influence of environmental factors on the diversity of medicinal plants (Assogbadjo et al., 2006;Seidel et al., 2008). In addition, other factors, including the accumulated temperature and extreme temperature that are monitored and detected in different climate zones of Yunnan Province, are often ignored. Therefore, efforts should be made to explore the connection between environmental factors and the medicinal material. It is of great significance to improve the quality of medicinal materials, standardize the production of G. rigescens medicinal plants and protect the wild resources (Huang et al., 2012).
This study aimed to demonstrate the feasibility of distinguishing the geographical origin of G. rigescens based on the FT-IR spectral information using the PLS-DA and SVM algorithms. The contents of main indicator components were compared in wild G. rigescens from different origins in combination with the ecological environmental factors, so as to illustrate the correlations of quality of medicinal materials with the ecological environmental factors. Moreover, the effects of seven environmental factors on the contents of five iridoids in the root tissues of G. rigescens were investigated through statistical analysis. Besides, principal component analysis (PCA) and correlation analysis (CA) were adopted to reveal the relevance and dependency between environmental factors and medicinal materials. One-way analysis of variance (ANOVA) and hierarchical cluster analysis (HCA) were also performed to explore the similarity of iridoid chemical profiles monitored in the samples. Findings in this study can not only provide a deeper insight into the resource assessment of G. rigescens, but also serve as a reference for the selection and breeding of excellent crops of this species.

Plant Materials and Chemical Reagent
The wild G. rigescens samples in the blooming period were obtained from four prefecture-level cities in Yunnan Province, southwestern China. Under the premise of ensuring sustainable utilization and representative, 45 batch samples (441 individuals) were collected and 5-11 individuals were covered in each batch as independent biological replicates. Among them, 158 individuals of 16 batch samples (No.1,3,7,11,12,14,15,17,(23)(24)(25)(26)(35)(36)(37)(38) were used for geographical traceability research. All samples were identified as Gentiana rigescens Franch. ex Hemsl. by Professor Hang Jin (Institute of Medicinal Plants, Yunnan Academy of Agricultural Sciences). The detailed sample information was shown in Figure 1 and Table 1. All root tissues of G. rigescens were dried at 50°C to consistent weight individually by electric thermostatic drying oven (Experimental Instrument Factory, Shanghai, China). Each dried sample was pulverized by DFY-500 kibbler (Wenling City Forest Machinery, Zhejiang, China) and sifted by a 60-mesh sieve. The fine powders were stored in sealed packets at room temperature until further analysis.
The methanol and acetonitrile of HPLC grade were purchased from Thermo Fisher Scientific (Massachusetts, USA). The formic acid of HPLC grade was provided by DikmaPure (Beijing, China). Analytical grade methanol reagent used for extraction was purchased from Xilong Chemical Company, Ltd. (Chengdu, China). The ultra-pure water that was injected into the HPLC system was purified by a UPTL-II-40L system (Chengdu, China). The reference compounds of loganic acid (LA) and GE were provided by the China Institute for Food and Drug Control (Beijing, China). The 6'-O-b-D-glucopyranosylgentiopicroside (GG) was provided by Shanghai Yihe Biological Technology (Shanghai, China). SW and swertiamain were purchased from Shanghai Shifeng Biotechnology (Shanghai, China). All reference compounds were analyzed by HPLC analysis with the purity of at least 98% and the structures were shown in Figure 2.

Infrared Spectra Acquisition and Preprocessing
A total of 158 dried root powders were analyzed by the Fourier transform infrared spectrometer (FT-IR, Perkin Elmer, USA) equipped with the deuterated triglycine sulfate (DTGS) detector. KBr pellets were made by mixing 1 mg of sample and 100 mg of spectrometry grade KBr for further detection. FT-IR spectra were recorded by 16 co-added scans in the wavenumber range of 4,000-400 cm -1 with a resolution of 4 cm -1 . Each sample was scanned three times in parallel, and the calculated average spectra were used for further analysis. The interference of H 2 O and CO 2 in the air should be deducted when the blank KBr slice was scanned. The collected FT-IR raw spectra data usually contain a lot of noise and interference information. In order to achieve good model performance, different preprocessing and feature variable extraction methods were needed to optimize the dataset. The combination of multiplicative scatter correction (MSC) and second derivative (SD) was selected to reduce the particle size and correct the baseline drift effect (Dhanoa et al., 2017;Zhan et al., 2017). SIMCA-P + 14.1 software was used for performing all pretreatment steps.

Chromatographic Determination
Quantitative analysis was carried out with an Agilent 1260 Infinity HPLC system (GL Sciences Company, Japan) that equipped with a G1311C quaternary pump, an Agilent Intersil-C18 column (150 × 4.6 mm, 5 µm) and the G1329B ALS automatic sampling device connected to a G1315D DAD VL detector. The gradient elution of eluent A (0.1% formic acid aqueous solution) and B (acetonitrile) was performed for the isolation of the target analytes. The flow rate of solvent was 1 ml·min -1 and the gradient elution procedure was as follows: 0-2.5 min, 93-90% A; 2.5-20 min, 90-74% A; 20-29.02 min, 74-41.7% A; 29.02-30 min, 41.7-10% A; and 30-34 min, 10-93% A. There was a four-min balancing time after each run. The column temperature was set at 30°C and the detected wavelength was chosen at 241 nm. Each extraction was analyzed with the same volume of 5 µl.

Preparation of the Sample Solutions and Standard Solutions
Each sample powder (0.0250 ± 0.0005 g) of root tissues was weighted accurately through AR1140 Ohaus electronic analytical   balance (Shanghai, China). Then it was put into a cuvette with stopper and added into 1.5 ml of 80% methanol, treating with the SY3200-T ultrasonic bath (40 kHz) for 40 min (Shanghai Shengyuan Instrument Equipment, Shanghai, China). The extract solution was filtered by the 0.22 µm micro-porous membrane filter before injecting into the HPLC system for analysis. For the calibration standard preparation of five characteristic compounds (LA, GE, GG, ST, and SW), the standard solution was accurately extracted 1 ml and appropriately added with chromatographic methanol to dilute it into a series of consecutive concentrations (0.0044-0.8001, 0.0040-0.7998, 0.0018-0.7683, 0.0498-3.6000, 0.0018-0.7001 mg·ml -1 , respectively). The levels of signal-to-noise ratio (SNR) are set equal to three and ten represented the parameters of limit of detection (LOD) and limit of quantification (LOQ) respectively, which were measured by taking an appropriate amount of mix standard solution and diluting with chromatography methanol to a serial of concentrations mixture. All prepared solutions were stored in a refrigerator at 4°C and reached room temperature before analysis.

Methodological Validation of Quantitative Analysis
The calibration standard curve, LOD, and LOQ were experimentally verified for each standard composition of this method. A mixture solution of five analytes was diluted to the appropriate concentrations for the construction of calibration curves. The repeatability, stability, and precision were experimentally validated by analyzing the mixed standard solutions. As the evaluation of precision in intra-and interday, the mixed standard solution was analyzed six times within a day and 3 consecutive days. The stability was determined by analyzing the mixed standard solutions six times (0,4,8,12,16,20, and 24 h) during a day. For the repeatability, six independent mixed standard solutions were analyzed by repeating the described procedure of mixed standard solutions preparation under this chromatographic condition. The accuracy of chromatographic analysis method was validated by the recovery test. Three different amounts (low, medium, and high) of each standard solution were added into the extraction of sample Nanhua Chuxiong (36). The recovery rates were calculated as follows:

Collection of Environmental Factors
The geographical information including longitude, latitude and altitude were recorded by GPS (HC 608) during collecting samples. All of the climate data were obtained from "China Meteorological Data Network" (http://data.cma.cn/site/index. html), including average annual temperature, 5-7 monthly mean temperature, average annual precipitation, and 5-10 monthly mean precipitation. These environmental factors were chosen according to the growth habit of G. rigescens (Chen, 2001). The information related to environmental factors and chemical data were dealt with by Excel (2013) software and transformed by log10(x) to eliminate the interference of dimension difference. As shown in Figure 1, the geographical distribution of samples was realized by ArcGIS 10 (ESRI Inc., USA).

Multisource Data Analysis
In this study, based on the classical Kernnard-Stone algorithm (Kennard and Stone, 1969), 158 samples from four prefecturelevel cities in Yunnan Province were divided into two-thirds training sets (118 individuals) and one-third test sets (40 individuals). The training sets were used to construct the classification model, and the test sets were used to estimate the classification performance of the model. PLS-DA and SVM algorithms were selected to establish calibration models for 158*1867 original spectral variables under sevenfold cross-validation. At the same time, the effects of preprocessing, feature extraction and their combination of spectral information mining were compared. PLS-DA is a supervised linear classification method based on the theory of the PLS regression algorithm, which contributes to the geographical tracking effectively of G. rigescens. The parameters R 2 Y(cum) and Q 2 (cum) (the cumulative R 2 and Q 2 for the extracted components) are used to predict the degree of data matching with the model. In general, it is considered that the model with Q 2 (cum) greater than 0.5 has better prediction ability. According to the root mean square error of crossvalidation (RMSECV) and root mean square error of prediction (RMSEP), the stability of the model is evaluated. The condition for model fitting required RMSECV to be greater than RMSEP and the model had better stability and prediction ability if the values were smaller (Berrueta et al., 2007). In addition, SVM can achieve better classification results for some complex nonlinear classification problems (Chen et al., 2007;Martelo-Vidal et al., 2013). Then, penalty parameter coast (c) and kernel parameter gamma (g) are important parameters that affect SVM classification performance. Grid search (GS) and Genetic algorithm (GA) are the methods for finding the optimal combination of parameters, which can quickly find the optimal solution in complex parameter space. PLS-DA and SVM models were carried out for geographical traceability of wild G.rigescens by SIMCA-P + (Version 14.1, Umetrics, SWE) and MATLAB (Version 2017A, MathWorks Inc, USA) software.
In addition, PCA, the unsupervised learning algorithm, can reveal the correlation among analytical parameters and visualize the classification trend of samples (Pei et al., 2019). HCA also can be regarded as the common method to uncover the similarities among samples, which calculates the dissimilarity between two patterns through measurement of distance. The nearest data or categories are combined to generate the cluster tree and the smallest distance has the highest similarity. ANOVA is a statistical method used to compare the mean of dependent variables in different groups. CA can be used to find the correlation between two variables. According to the visualization results of the above-mentioned algorithms, the effects of different environmental factors on the chemical components of G. rigescens can be preliminarily explored. In this study, the variable factor plot of PCA was performed by the package of "FactoMineR," which was used to reveal the relationship between environmental factors and the variable of five iridoids. HCA was established to explore the difference and similarities of samples by the package of "Dendextend." The interaction among chemical components extracted from various region samples was investigated by the multiple trials of ANOVA, which was carried out by the package of "Multcomp." CA was achieved by the package of "Hmisc" so as to study the effect of environmental factors on the contents of five iridoids. The analysis of ANOVA, CA, PCA, and HCA were performed by RStudio 3.4 software.

Interpretation of FT-IR Spectra Features
The averaged FT-IR spectra of 158 G. rigescens samples from four different geographical origins were shown in Figure 3. Obviously, the overlapped average FT-IR spectra have some similar characteristic peaks, but there are differences in absorption intensity which implies differences in the accumulation of chemical components from different regions. Compared with other geographical origins, the absorption intensity increased significantly in the root samples of Dali. The obvious peak at 3,400 cm -1 was assigned to the first overtone with O-H stretching. The bands around 2,928 and 1,737 cm -1 represented methylene asymmetric and C = O stretching vibration of esters respectively, which were mainly caused by esters. The band at 1,427 cm -1 represented the asymmetric bending vibration of methyl group. This was the result of carbohydrates or esters. The peak at 1,375 cm -1 showed that the bending vibration of methyl group was mainly caused by esters. In addition, the sharpest peaks at 1,075 and 1,615 cm -1 were the main absorption bands of GE, belonging to C-OH or C-O stretching and C-C asymmetric stretching vibration. Anyway, since the metabolic components of G. rigescens are very similar, multivariate analysis methods were needed for the next analysis (Mi et al., 2015;Zhao et al., 2015).

PLS-DA and SVM Analysis for Geographical Origin Identification
In order to develop a reliable discriminant model, some samples were used to build the model and others were used for validation. It was closely related to the robustness of the model (Ballabio and Consonni, 2013). The results that RMSECV values of all PLS-DA models were greater than RMSEP indicated the reliable model.
To validate the fitting degree of PLS-DA models, the 20iteration permutation test were performed. As shown in Figure  S1 that all permutated Q 2 and R 2 values (left) were lower than the original values (right). It showed that all models were regarded to be suitable without overfitting. However, all Q 2 values less than 0.5 indicated the weak prediction ability. The classification performance of PLS-DA model with preprocessed spectra was better than the original spectral model. The accuracy of training and validation set was 93.22 and 95%, respectively. However, the classification effect among different categories was not ideal in the scatter plot. Despite the combination of preprocessing and feature extraction method, there was still a big confusion between QJ and KM. PLS-DA model was established by selecting the variables with variable importance for the projection (VIP) score greater than 1 for improving classification effect. However, the accuracy of the correction set and the verification set after feature filtering was less than 70%, which showed that this variable filtering method was not suitable for this data. The separation results of calibration set and validation set of all models were shown in Table 2 and Figure S2.
Compared to the method of PLS-DA, SVM had the better performance to deal with nonlinear problems. Thus, SVM was also used to discriminate G. rigescens samples from different producing areas based on FT-IR spectra data in this study. The assign of training and test samples was same as the PLS-DA model. In addition, Radial Basis Function (RBF) was preferred to build SVM model for handling the nonlinear and linear relationships between spectra data (Zhao et al., 2009). To attain a good classification result, penalty parameter c and kernel parameter g were used to optimize the SVM model. The best c and g of 5.6569 and 0.0055 were confirmed by the GS method in the preprocessed model ( Figure 4A). The 83.05% accuracy of training sets and 100% accuracy of test set were acquired in calibration model ( Figure 4B). It meant that 20 samples were misclassified in the training set and all 40 unknown samples were classified correctly in the test set. Tables S4 and S5 provided the confusion data matrix of training set and test set of PLS-DA and SVM models. Compared with the model built directly from the original data, the classification performance was significantly improved. In addition, other parameter optimization methods have been tried to distinguish G. rigescens from different origins in SVM. Figure 4C showed that when the genetic algorithm run to the 10th generation, the average fitness value tended to be stable, while the optimal fitness value kept constant with the increase of iteration times. However, based on the method of GA, the accuracy of test sets was only 52.5 and 62.5% in the original and pre-processed model ( Figure 4D). The classification results of SVM models with different processing methods were shown in Table 3. In conclusion, compared with PLS-DA method, the SVM-GS can show more superior performance for the discrimination of G. rigescens from different producing areas. As well, the selections of spectra pretreatments and some parameters are important for a well calibrated SVM model.

Quantitative Analysis of Five Iridoids
The external standard method was carried out to quantify the contents of five iridoids. All standard solutions were diluted by methanol to an appropriate concentration for establishing the regression equation, which were plotted with seven different concentrations and revealed good linear relationship (R 2 > 0.9991) ( Table S1). The relative standard deviations (RSDs) of precision, stability, and repeatability were reasonable and desirable (RSD < 5%), indicating that the analytical instrument, extraction, and analysis method used in this experiment are reproducible (Table S2). Additionally, the results of recovery test for analyzing the five iridoids were in the range of 97.2-100.1% and the RSD values were inferior to   1.18%, which indicated that this method was creditable for quantitative analysis (Table S3).
In this study, LA, ST, GE, SW, and GG were the main compounds detected in the root tissues of G. rigescens. However, the Chinese Pharmacopeia merely takes the contents of GE as the threshold to evaluate the quality of G. rigescens (National Pharmacopoeia Committee, 2015), which presents with some insufficiency due to the holistic characteristic of TCM. Therefore, finding out the evaluation indicator as many as possible was meaningful for establishing a comprehensive method to evaluate the quality of medicinal materials. In this study, the chromatographic peak of sample obtained from Nanhua Chuxiong (36), Luoping Qujing (23), Eryuan Dali (13), and Xundian Kunming (9) were measured and five iridoids were chosen as the target analytes to evaluate the quality of G. rigescens by comparing and validating the retention times of reference standards ( Figure 5).
One-way analysis of variance was computed to display the difference of iridoids among four origins ( Figure 6). Compared with other four chemical components, the contents of ST exhibited the most fluctuating characters (Figure 7). The highest content (2.84 mg·g -1 ) of ST was detected in sample of Nanhua, Chuxiong (36), which was approximately 29 times than the lowest one (0.10 mg·g -1 ) measured in the sample of Xundian Kunming (7) ( Table 4). For the GG, the highest content (1.39 mg·g -1 ) was measured in the sample of Lufeng Chuxiong (45), which was merely three times than that of the lowest one (0.47 mg·g -1 ) collected from Fuyuan, Qujing (31) ( Table 4). The ratios of the highest content to the lowest content for other three constitutes of SW, LA, and gentiopcroside are 13.0, 4.3, and 3.7 times, respectively ( Table 4). The above-discussed results indicated that the contents of five iridoids measured in the root tissue of G. rigescens was remarkably varied among different origins. Additionally, the GE content ranged from 24.84 to 88.27 mg·g -1 , which was significantly higher than the threshold value of the Chinese Pharmacopoeia (the contents of GE > 1.5 mg·g -1 ). Interestingly, the highest content of these five iridoids was exhibited in the sample obtained from Chuxiong ( Figures 5  and 7 and Table 4). It can be seen that different origins have great influence on the accumulation of five iridoids in the root tissue of G. rigescens.
HCA was performed to expose the similarities of G. rigescens obtained from different origins based on the five iridoid content (Figure 8). The result showed that 45 batch wild samples had a tendency to be divided into two categories. Almost all of the samples obtained from Chuxiong were clustered into one class, which were close to the G. rigescens samples collected from Dali, suggesting that the chemical profiles of G. rigescens between these two origins are similar. However, samples obtained from Kunming and Qujing were disorganized assembled, indicating that the similarity of iridoid content in these two origins was higher than that of samples picked from Chuxiong and Dali. This unsatisfactory classification result for G. rigescens obtained from Kunming and Qujing indicated that the difference of iridoid content between these two prefectures regions was insignificant, coinciding with the result of ANOVA.

Correlation Between Environmental Factors and Five Iridoids
To explore the relationship between environmental factors and the iridoid content, correlation analysis was carried out and the results were shown in Table 5. The LA was significantly affected    by the environmental factors of altitude (P < 0.01) and 5-7 monthly mean temperature (P < 0.05). Besides, iridoids of ST, GE, and GG were influenced by longitude (P < 0.05). Interestingly, component of SW was remarkably affected by almost all the factors including altitude, average annual temperature, 5-7 monthly mean temperature and 5-10 monthly mean precipitation (P < 0.01). However, the altitude environmental factor was negative to the accumulation of these five iridoids, particularly for components of LA and SW (P < 0.01). Altitude showed a significantly negative relationship with the average annual temperature and 5-7 monthly mean temperature (P < 0.01), which was in correspondence with common sense. The information also could be summarized that the average annual temperature and 5-7 monthly mean temperature could promote the accumulation of these five iridoids, especially for SW (P < 0.01). With the increase of precipitation, the contents of these five iridoids decreased, particularly for SW (P < 0.05). Besides, an inner-correlation among different iridoids measured in root tissue of G. rigescens was revealed and results were shown in Table 5.
The LA was significantly positive correlation with ST and GE (P < 0.01). The SW was positive to LA (P < 0.05). ST showed a significantly positive correlation with SW and GE (P < 0.01). GE had a positive correlation with SW (P < 0.05). Findings indicated that the accumulation of these five iridoids might be closely related.
In the present study, PCA was used to reveal the relationship between environmental factors and the five iridoids, which could verify and testify the result of correlation analysis. The variables factors map of PCA was shown in Figure 9. Obviously, the average annual temperature and 5-7 monthly annual temperature were positive for the accumulation of these five iridoids, especially for SW and GG. However, the altitude, average annual precipitation, and 5-10 monthly annual precipitation were adverse for the storage of these five constituents for the root tissue of G. rigescens, remarkably for the components of LA, ST, and GE. The results of variables factors map for PCA were consistent with the correlation analysis, indicating that the results of statistical analysis were reliable.

The Variety of Iridoid Content in Root Tissue of G. rigescens From Different Origins
The species, quantity, and quality of TCM across different producing areas can be quite different. Typically, the intraspecific variation among populations is the fundamental factor affecting the yield and quality of TCM materials (Bowling et al., 2002). In this study, multivariate analysis was carried out to mine the FT-IR spectral information, which was regarded as an effective approach to authenticate the geographical origin of gentian samples. Compared with the PLS-DA model, SVM showed better classification performance, which might be due to the fact that it maximally eliminated the influence of interference information. In addition, based on the quantitative analysis of G. rigescens samples collected from four main origins of Yunnan, different origins greatly affected the accumulation of five iridoids in the root tissue of G. rigescens. Additionally, all these five iridoids with the highest contents were obtained from Chuxiong consistently, indicating that Chuxiong might be one of the best cultivation areas for this Chinese herbal medicine. Meanwhile, according to the analysis of habitat, it was speculated that the high content of iridoids might be induced by the temperature in Chuxiong, where the day/night temperature differences were greatly different, especially for the long-day spring and autumn seasons. Under such condition, more secondary metabolites were accumulated during the daytime, but less were consumed at night. Besides, the precipitation of Chuxiong was lower than that of the other four origins in Yunnan (Akula and Ravishankar, 2011), which was beneficial for producing metabolites, and such hypothesis was confirmed by other studies (Bowling et al., 2002). However, the similarities of samples collected from various locations were  The abbreviations are follow as: A, latitude; B, longitude; C altitude; D, average annual temperature; E, 5-7 monthly mean temperature; F, average annual precipitation; G, 5-10 monthly mean precipitation; LA, loganic acid; GG, 6'-O-b-D-glucopyranosylgentiopicroside; ST, swertiamarin; GE, gentiopicroside; and SW, sweroside, respectively. Significant differences (P < 0.05) marked as "*", (P < 0.01) marked as "**".
insignificant, even under the same climatic zone. Interestingly, the chemical profiles for samples obtained from Chuxiong and Dali showed similar characteristics, which might be attributed to the similar geographical environment in these two origins. Table  1 displays the numerical similarities of annual average temperature in Chuxiong and Dali, and all these values are higher than those in Kunming and Qujing. Compared with the environmental factor of temperature, the annual average precipitation values in Chuxiong and Dali were lower than those in Kunming and Qujing. Significantly, the origin of Chuxiong represented a favorite environment for accumulating the secondary metabolites in the root tissue of G. rigescens, which was also meaningful for the medicinal materials with better quality. Based on the above-mentioned analysis results, it was meaningful to effectively identify the geographical origin to create a cultivation environment similar to Chuxiong. Additionally, it was also significant to build a cultivation base in Chuxiong to satisfy the market demand of G. rigescens and to ensure the quality and efficiency of this medicinal material.

Correlation Between Environmental Factors and Quality of G. rigescens
According to the correlations of seven environmental factors with the quality of G. rigescens, some environmental factors greatly affected the quality of G. rigescens, and there were significant correlations between these factors. In addition, the results of CA and PCA indicated that the average annual temperature and 5-7 monthly mean temperature were profitable for the accumulation of five iridoids, especially for SW. The underlying mechanism might be that the increase in temperature promoted the growth of meristems and organs, thus further accelerating the growth and development of plants (Morison and Lawlor, 1999). Another reason might be that the optimum temperature for metabolizing enzyme was achieved with the increase in ambient temperature (Treimer and Zenk, 1979). As a proof, Krithika et al. revealed that the optimum temperature for 10-hydroxygeraniol dehydrogenase (Cr10HGO) was 30°C, and its enzyme was a critical factor for synthesizing the intermediates in the biosynthetic pathway of iridoid (Krithika et al., 2015). Hence, within an appropriate range, high temperature might benefit the accumulation of iridoid substances in the root tissue of G. rigescens. Unquestionably, the environmental factor of altitude was negatively correlated with the contents of five iridoids, especially for LA and SW. Such phenomenon was possibly illustrated by the CA result that, the altitude showed a remarkably negative correlation with temperature. In other words, a higher altitude resulted in a lower temperature (Kattel et al., 2013). Chen et al. had drawn the same conclusion that the catechin content in oolong tea was negatively correlated with altitude, the content difference induced by temperature was low at a high latitude, and the enzymatic activity declined along with the increase in altitude (Chen et al., 2014). Overall, to achieve the higher quality and yield of G. rigescens raw material, the synergistic effects of environmental factors should be further investigated.
Precipitation is also a crucial factor that affects the quality of herbal medicine. In this study, the average annual precipitation and 5-10 monthly mean precipitation declined, which was beneficial for the accumulation of five iridoids, especially for SW. Jamieson et al. validated that, compared with L. dalmatica that grew in the condition of decreasing available water, the increase in soil available water decreased the average concentration of iridoid glycoside by about 35% (Jamieson et al., 2013). Other studies also find that the appropriately deficit irrigation induces the accumulation of secondary metabolites, such as lycopene and organic acids . Therefore, the cause of such phenomenon may be explained by the observation that, the appropriate drought stress is conducive to the accumulation of secondary metabolites (Chao and Ye, 2013). For instance, the camptothecin content is accumulated under drought stress, while the sustained drought stress decreased camptothecin content in leaf tissue (Liu, 2000). The above-mentioned phenomenon is consistent with the prediction of the growth-differentiation balance hypothesis, which uncovers that the synthesis and accumulation of secondary metabolites may increase under resource limiting conditions (Herms and Mattson, 1992). Based on the above discussion, theoretical results that, appropriate drought stress was meaningful for increasing the contents of secondary metabolites measured in the root tissue of G. rigescens during its cultivation process, was obtained. However, the precise soil moisture and air humidity should be further investigated. On the whole, findings in this study revealed that various environmental factors exhibited a synergistic effect on the growth of G. rigescens. Typically, the average annual temperature, 5-7 monthly mean temperature, the average annual precipitation and the 5-10 monthly mean precipitation played important roles in forming the daodi quality of G. rigescens.
Additionally, results of CA and PCA also uncovered the interrelationships among LA, ST, GE, and SW. Specifically, LA showed significantly positive correlation with the accumulation of ST, GE, and SW. This was because that, LA was the precursor of secoiridoid glycoside (Inouye, 1971;Liu et al., 2017). In other words, a higher content of LA was beneficial for the synthesis of its downstream products. ST also displayed significantly positive correlation with GE and SW. Besides, GE was positively correlated with SW. According to the analysis of structures, ST is the hydroxylated SW and GE can be formed by dehydration of ST (Inouye, 1971;Inouye et al., 1996). This revealed a general trend that, GE was the end metabolic product of these four iridoids, which was corresponding with the quantitative analysis result that the GE content was higher than those of LA, SW, and ST. Besides, Qi et al. confirmed that the root tissue of G. rigescens was the primary part for the accumulation of GE (Qi et al., 2017). Therefore, this quantitative analysis result accorded with the historically drug usage habit to select the root tissue for treating various ailments. On the other hand, high content of GE was accumulated, which might be the favorable defense substance in G. rigescens against environmental stress (Liu et al., 2017).
Our study clearly showed that there was a certain correlation of environmental factors with the quality of medicinal materials, and it might be a suitable method to find an alternative origin of medicinal materials according to the level of environmental factors. It was the reason that the change of environment in plant physiology could induce the physiological change of plants to produce the secondary metabolites (Su et al., 2005). Therefore, the difference of the temperature and precipitation existed between the planting land and the traditional land should be considered in the introduction, which can provide a theoretical basis for obtaining high-yield and high-quality resources. It is a complex process to explore the impact of environmental factors on the quality of traditional Chinese medicine, and single factor analysis is limited by its incompleteness. Thus, it is necessary to establish a systematic evaluation method including the role of a single factor and the interaction of various factors to deeply analyze the impact of environmental factors on the quality of medicinal materials, especially that in daodi origins. The analysis of all environmental factors of daodi medicinal materials is necessary to find the alternative or artificial habitats for herbs. It is our expectation that these findings can provide a basis for revealing the quality formation mechanism of G. rigescens, and provide a certain reference value for the collection, protection, development, and utilization of resources and selection and breeding of new varieties.

CONCLUSION
Due to the non-destructive, fast, and efficient advantages of FT-IR spectra, multivariate analysis methods were used to identify G. rigescens samples from different areas in this study. SVM-GS was considered to be an effective method to trace the geographical origin. In addition, the relationship between five iridoid content in root tissue of G. rigescens and environmental factors under the same climatic zone was explored. Quantitative analysis direct uncovered that sample obtained from Chuxiong has better quality than the other three origins sample according to the highest content of iridoids in root tissue of G. rigescens. Based on the quantitative analysis, results of ANOVA and HCA indicated that the content of five iridoids existed a significant difference in root tissue of G. rigescens obtained from various locations. Results of CA and PCA showed that the climatic factor had a significant influence on the accumulation of LA, ST, GE, and SW in the same climatic zone. A high value of altitude and precipitation was unfavorable for the accumulation of these five iridoids. However, appropriate drought stress was meaningful for increasing the contents of iridoids. On the whole, our results have certain reference significance for the protection and cultivation of G. rigescens. These findings can provide scientific support for tracking geographical origin of G. rigescens and revealing the accumulation and changes of chemical components in different environments.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
LL conducted all laboratory experiments and drafted the article. Z-TZ was responsible for checking the grammar and revising this manuscript. F-RX guided the data analysis and partial interpretation of results. Y-ZW designed the study. All authors have contributed to the article and approved the submitted version.