Discovery of Quality Markers of Nucleobases, Nucleosides, Nucleotides and Amino Acids for Chrysanthemi Flos From Different Geographical Origins Using UPLC–MS/MS Combined With Multivariate Statistical Analysis

Nucleobases, nucleosides, nucleotides and amino acids, as crucial nutrient compositions, play essential roles in determining the flavor, function and quality of Chrysanthemi Flos. The quality of Chrysanthemi Flos from different geographical origins is uneven, but there have been no reports about the screening of their quality markers based on nutritional ingredients. Here, we developed a comprehensive strategy integrating ultra performance liquid chromatography coupled with triple-quadrupole linear ion-trap tandem mass spectrometry (UPLC–MS/MS) and multivariate statistical analysis to explore quality markers of Chrysanthemi Flos from different geographical origins and conduct quality evaluation and discrimination of them. Firstly, a fast, sensitive, and reliable UPLC–MS/MS method was established for simultaneous quantification 28 nucleobases, nucleosides, nucleotides and amino acids of Chrysanthemi Flos from nine different regions in China. The results demonstrated that Chrysanthemi Flos from nine different cultivation regions were rich in the above 28 nutritional contents and their contents were obvious different; however, correlation analysis showed that altitude was not the main factor for these differences, which required further investigation. Subsequently, eight crucial quality markers for nine different geographical origins of Chrysanthemi Flos, namely, 2′-deoxyadenosine, guanosine, adenosine 3′,5′-cyclic phosphate (cAMP), guanosine 3′,5′-cyclic monophosphate (cGMP), arginine, proline, glutamate and tryptophan, were screened for the first time using partial least squares discriminant analysis (PLS-DA) and the plot of variable importance for projection (VIP). Moreover, a hierarchical clustering analysis heat map was employed to intuitively clarify the distribution of eight quality markers in the nine different regions of Chrysanthemi Flos. Finally, based on the contents of selected eight quality markers, support vector machines (SVM) model was established to predict the geographical origins of Chrysanthemi Flos, which yielded excellent prediction performance with an average prediction accuracy of 100%. Taken together, the proposed strategy was suitable to discover the quality markers of Chrysanthemi Flos and could be used to discriminate its geographical origin.

According to different cultivation regions, Chrysanthemi Flos have been divided into Boju (BJ), Chuju (CJ), Gongju (GJ), Huaibaiju (HBJ), Sheyanghangbaiju (SYHBJ), Tongxianghangbaiju (TXHBJ), Fubaiju (FBJ), Qiju (QJ), and Jiaju (JJ) in the marketplace, exhibit specific geographical indication, and they are indiscriminate in medicinal and tea use. Notably, BJ, CJ, GJ, HBJ, and Hangbaiju (SYHBJ and TXHBJ) among them have been officially recorded in the Chinese Pharmacopoeia (2020 edition) under the same item of "Juhua" as standard varieties of Chrysanthemi Flos. Although the evaluation criteria of Chrysanthemi Flos from different cultivation regions are the same in the Chinese Pharmacopoeia, the prices of Chrysanthemi Flos from different geographical origins vary greatly in the market due to their quality differences. Moreover, previous studies have shown that geographical environments greatly influence the chemical compositions and quality of Chrysanthemi Flos (Xie et al., 2012;Du et al., 2015;Han et al., 2015). Currently, only chlorogenic acid, 3,5-O-dicaffeoylquinic acid and luteolin-7-O-β-D-glucoside are quantified as marker compounds for the quality control of Chrysanthemi Flos in the Chinese Pharmacopoeia. To evaluate the quality of Chrysanthemi Flos from different geographical origins, many holistic chemical profiling methods have been carried out, such as liquid chromatography-mass spectrometry (LC-MS) for the analysis of flavonoids and phenolic acids (Ding et al., 2016;Nie et al., 2018;Zhang et al., 2018), and gas chromatography-mass spectrometry (GC-MS) for the analysis of volatile compounds (Luo et al., 2017;Nie et al., 2018;Zhang et al., 2018). Obviously, flavonoids, phenolic acids, and volatile oils have been the main focus for most studies. However, as far as we know, there have been no reports on the quality evaluation of Chrysanthemi Flos from different geographical origins based on the nutritional compounds such as nucleobases, nucleosides, nucleotides, and amino acids.
Nucleobases, nucleosides, nucleotides, and amino acids are crucial nutritional and functional compounds, and they exhibit diverse bioactivities including immuno-regulatory, antioxidative, anti-obesity, antihypertensive, and anticancer effects (Wu, 2013;Chang et al., 2020). Besides health benefit function, previous studies have demonstrated that some of these nutrient compositions were positively correlated with the tea flavor and quality of tea (Yang et al., 2018;Yu and Yang, 2019). For example, Anji white tea is a popular high-quality tea in China because of its abundant amino acids (Yu and Yang, 2019). More importantly, nucleobases, nucleosides, nucleotides and amino acids have been screened as quality markers of several Chinese herbal medicines (CHMs) and functional foods, such as Mactra veneriformis , Ziziphus jujuba , royal jelly  and Angelica sinensis (Qu et al., 2019). In our previous study, glutamate, asparagine and aspartate were selected as key quality markers for Chrysanthemi Flos from three different flowering stages (Chang et al., 2020). Therefore, the discovery of these nutritional compounds as quality markers will facilitate the effective use and quality control of Chrysanthemi Flos from different geographical origins.
In the present study, an integrated strategy of ultra performance liquid chromatography coupled with triplequadrupole linear ion-trap tandem mass spectrometry (UPLC-MS/MS) and multivariate statistical analysis was developed to reveal the quality markers of Chrysanthemi Flos from different geographical areas based on nutritional compounds. To start, the UPLC-MS/MS method to simultaneously quantitate 28 nucleobases, nucleosides, nucleotides, and amino acids of Chrysanthemi Flos from nine different geographical origins was established and validated in this study. On the basis of these 28 nutrient compositions, multivariate statistical methods, including principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA), were applied to discover quality markers of Chrysanthemi Flos from nine different cultivation regions. A hierarchical clustering analysis heat map was then employed to visualize the distribution of the selected quality markers in nine different geographical origins of Chrysanthemi Flos. Moreover, support vector machines (SVM) model was established to discriminate and predict the geographical origins of Chrysanthemi Flos using selected quality markers. In summary, this study provides a new strategy for quality assessment of Chrysanthemi Flos in nine different geographical origins as well as other medicinal herbs or foods using nutritional ingredients as quality markers.
The samples of Chrysanthemi Flos were collected from nine different geographical origins in China from October to November 2018, and the detailed information of all samples is reported in Figure 1

Preparation of Standard Solutions
The standard stock solutions were prepared by dissolving approximately 1 mg of each reference compounds in 10 ml of H 2 O. According to the results of preliminary experiments, an appropriate amount of the above 28 standard stock solutions was mixed and then diluted with water to obtain reference compound mixture working solutions with eight different non-zero concentration levels for constructing the calibration curves. Three independent replicated analyses at each concentration level were executed in this study using UPLC-MS/MS. Prior to analysis, all the standard solutions were filtered through a 0.22 µm cellulose membrane filter and stored at 4°C.

Preparation of Sample Solutions
The dried samples were pulverized to homogeneous powders (80 mesh). Each dried sample powder (1.0 g) precisely weighed was added to a 50 ml glass-stoppered conical flask containing 40 ml of distilled water. All of the mixtures were accurately weighed and sonicated (100 Hz, 25°C) for 60 min, and then water was added to compensate for the water lost during extraction. After centrifugation (13,000 r/ min, 15 min), the supernatants were kept at 4°C and filtered through 0.22 µm cellulose membrane filters before injection into the UPLC-MS/MS system for analysis. 10 replicates of Chrysanthemi Flos at each cultivation region were prepared to improve the reliability.

UPLC-MS/MS Analysis Conditions
Chromatographic analysis was performed on a Waters ACQUITY UPLC ™ system (Waters, Milford, MA, United States), which was equipped with a binary solvent manager, a column manager and an automatic sampler. The ACQUITY BEH amide column (100 mm × 2.1 mm, 1.7 µm) was applied with the column temperatures set at 30°C. The temperature of the automatic sampler was maintained at 10°C. The binary mobile phase was composed of A (0.2% formic acid, 5 mM ammonium formate and 5 mM ammonium acetate in H 2 O) and B (0.2% formic acid, 1 mM ammonium formate and 1 mM ammonium acetate in acetonitrile). UPLC linear gradient elution conditions were: 0-3 min, 10% A; 3-9 min, 10-18% A; 9-15 min, 18-20% A; 15-16 min, 20-46% A; 16-18 min, 46% A. The injection volume was 1 μL, and the flow rate was set at 0.4 ml/ min. Mass spectrometry was carried out by a QTRAP 6500 + triple quadrupole linear ion-trap mass spectrometer equipped with a TurboV ™ ion source (Applied Biosystems SCIEX, CA, United States) operating in positive ion mode. The parameters of QTRAP-MS/MS analysis were set as follows: ion-spray voltage 5500 V, source temperature 400°C, curtain gas 40 psi, nebulizer gas (GS1, nitrogen) 40 psi, and auxiliary heater gas (GS2, nitrogen) 40 psi. The collisionally activated dissociation (CAD) was set to a medium level. Based on the selected parent and daughter ions, multiple reaction monitoring (MRM) mode was applied to analyze 28 analytes (1-28). The raw data of all Chrysanthemi Flos samples were acquired by the Analyst 1.6.3 software (Applied Biosystems SCIEX, CA, United States).

METHOD VALIDATION
The validation of UPLC-MS/MS method established in this study was performed by evaluating linearity, limits of detection (LOD), limits of quantification (LOQ), precision, repeatability, stability, recovery, and matrix effects according to our previous study (Chang et al., 2020).

Calibration Curves, LOD and LOQ
The mixed standard solution was diluted to a series of solutions with at least six appropriate concentrations in duplicate to make calibration curves. Then linear regression was constructed by plotting the peak areas versus the corresponding concentration of each analyte. The LOD and LOQ for each analyte under the present UPLC-MS/MS conditions were determined at signal-tonoise ratios (S/N) of 3:1 and 10:1, respectively. The peak height divided by the background noise value was calculated as the S/N.

Precision, Repeatability and Stability
Intra-and inter-day variations were chosen to assess the precision of the method. For intra-day precision test, the mixed standard solutions were analyzed for six times within 1 day, while for interday precision test, the mixed standard solutions were examined by repeating the experiments during three consecutive days. To evaluate the repeatability, six sample solutions were independently prepared from the same batch of Chrysanthemi Flos samples and analyzed in parallel. To test the stability of the sample solutions, one of the Chrysanthemi Flos sample solutions mentioned above was stored at 25°C and determined in various periods (0, 4, 8, 12, 24, and 48 h), respectively. All these variations are expressed as relative standard deviation (RSD).

Recovery
A recovery test was conducted to evaluate the accuracy of this method Li et al., 2013;Zhao et al., 2018). The test was performed by spiking known quantities of the 28 standards with high, middle and low levels to a certain amount of Chrysanthemi Flos sample which had been analyzed in the repeatability test. Then, the spiked samples were extracted, processed and quantified on the basis of the methods described above, and triplicate experiments were performed on each level. The average recovery rate was calculated by the following formula: recovery (%) [(observed amount-original amount)/spiked amount] × 100% Wu et al., 2015;Zhao et al., 2018).

Matrix Effects
The matrix effect was defined as the ion suppression or enhancement in the process of analyte ionization (Matuszewski et al., 2003;Guo et al., 2013). The slope comparison method was used to evaluate the latent interfering effect from co-eluting matrix constituents on the ESI response Guo et al., 2013). The sample extracts were spiked with appropriate amounts of standards, similar to the procedure for the apparent recovery measurements based on the recovery parameters described above, and were used to construct standard addition calibration curves. Then, the slopes of the calibration curves of the standard addition experiments were compared with the slopes obtained from the pure aqueous standards at the same concentration levels. Matrix effects are expressed as a matrix factor by the equation: matrix factor (MF) slope matrix/slope solvent, MF 1 indicates no significantly matrix effects, MF < 1 denotes ion suppression, MF > 1 indicates ion enhancement Silvestro et al., 2013). Before injection, the sample extracts were stored at 4°C for 24 h to allow interaction between the analytes and the matrix of the sample.

Multivariate Statistical Analysis
In this study, 90 Chrysanthemi Flos samples and the contents of 28 target compounds in these samples constructed a data matrix with 90 rows and 28 columns. The 90 × 28 data matrix was made into a table and subjected to PCA and PLS-DA using SIMCA-P software (Ver.13, Umetrics, Umea, Sweden). The plot of variable importance for projection (VIP) was applied to screen and find quality markers of nutritional compounds for Chrysanthemi Flos from nine different geographical origins. A one-way analysis of variance (ANOVA) was conducted using SPSS 20.0 software (IBM SPSS Statistics, Chicago, IL, United States). Pearson correlation analysis, hierarchical clustering analysis heat map, F-test, partial F-test and Cramer-von Mises (CVM) test were performed in the R routine (Desharnais et al., 2017a;Wei and Simko, 2017). Furthermore, (SVM) model for the classification and prediction of geographical origins of Chrysanthemi Flos samples was established under Matlab R2017a software environment (Mathworks, Natick, United States).

Development of UPLC-MS/MS Analysis Conditions
The nucleobases, nucleosides, nucleotides, and amino acids are the compounds with high polarity, which are easily separated in high ratio of aqueous mobile phase and have poor retention property on reversed-phase column. Our previous study showed that the peak capacity, retention ability and resolution of the ACQUITY BEH amide column (100 mm × 2.1 mm, 1.7 µm) were better than those of the ACQUITY BEH C18 (100 mm × 2.1 mm, 1.7 µm) and ACQUITY HSS T3 (100 mm × 2.1 mm, 1.8 µm) columns for these hydrophilic compounds of Chrysanthemi Flos under the same chromatographic condition (Chang et al., 2020). Therefore, the ACQUITY BEH amide column was chosen for this study. Additionally, different mobile phase modifiers, including different concentrations of ammonium formate, ammonium acetate, formic acid and their combination, were investigated to increase the separation and improve the peak shapes of the 28 target analytes in our preliminary experiment (Chang et al., 2020). Consequently, a mixture consisting of A (aqueous solution containing 5 mM ammonium formate, 5 mM ammonium acetate and 0.2% formic acid) and B (acetonitrile solution containing 1 mM ammonium formate, 1 mM ammonium acetate and 0.2% formic acid) was considered as the most suitable mobile phase system. The typical FIGURE 2 | Typical UPLC-MS/MS chromatograms of the 28 target compounds for the Chrysanthemi Flos sample (SYHBJ-S43). The numbering of the 28 target compounds in the chromatograms is the same as in Table 1.

NO.
Analytes  chromatograms of the Chrysanthemi Flos sample (SYHBJ-S43) and reference compound mixture solution are presented in Figure 2 and Supplementary Figure 3, respectively. As for MS/MS detection conditions, both positive and negative ion modes were first tested. It was found that positive ion mode had lower background noise, higher sensitivity and intensity than negative ion mode for the 28 target analytes, which made it easier to detect the lower content of analytes in the Chrysanthemi Flos extracts. Therefore, positive ion mode was adopted for the 28 target compounds detection and characterization. Due to the mutual interference may happen between the compounds producing ions with similar m/z, the MRM mode was utilized in this study to improve the selectivity and sensitivity for these target compounds (Yao et al., 2015). Then, the declustering potential (DP), collision energy (CE), cell exit potential (CXP) and entrance potential (EP) parameters were optimized for each a The 28 analytes are the same as in Table 1. In these analytes, 1-2 are nucleobases; 3-11 are nucleosides; 12-13 are nucleotides; 14-28 are amino acids, 14-21 of which are essential amino acids, 22-27 are non-essential amino acids and 28 is non-protein amino acid. b Mean ± SD (n 10). c Not detected. d TAA, is the total contents of all amino acids; TEA, is the total contents of essential amino acids; TNA, is the total contents of TNN and TAA; TNEA, is the total contents of non-essential amino acids; TNN, is the total contents of nucleobases, nucleosides and nucleotides; TNS, is the total contents of nucleosides. analyte to obtain the most selective and specific precursor/ product ion pairs by injecting individual standard solutions in direct infusion mode. The MRM parameters of 28 investigated compounds are listed in Table 1.

UPLC-MS/MS Method Validation
The proposed UPLC-MS/MS method for the quantitation of 28 target analytes was validated to determine the linearity, LOD, LOQ, intra-day and inter-day precisions, repeatability, stability, recovery, and matrix effect. As shown in Supplementary Table 2, the F-test yielded p-values < 0.05 for the 28 target compounds, indicating that all data-point distributions were heteroscedastic. Then, a weighting factor of 1/x 2 was used in the calibration process (Gu et al., 2014;Alladio et al., 2020). The p-values obtained from the partial F-test were >0.05 for the 28 target compounds, and a linear calibration model was, therefore,  Frontiers in Chemistry | www.frontiersin.org August 2021 | Volume 9 | Article 689254 8 adopted for them. Moreover, the correlation coefficient values (r 2 ) were better than 0.9972 for all 28 target compounds, indicating good linear regressions between the investigated compounds' concentrations and their peak areas within the test ranges. CVM test produced p-values > 0.05 for all 28 target analytes, suggesting that all calibration curves were considered validated (Desharnais et al., 2017a;Desharnais et al., 2017b). The overall LODs and LOQs were in the range of 0.03-22.62 ng/ ml and 0.10-86.65 ng/ ml, respectively, which revealed that the established analytical method was sensitive enough for quantitative determination of these target analytes. Moreover, the relative standard deviation (RSD) values of intraand inter-day variations of the 28 target analytes were less than 3.62 and 6.24%, respectively (Supplementary Table 3). The RSD values of the repeatability and stability were all not more than 6.84%, and all analytes of Chrysanthemi Flos sample solutions remained stable for 48 h. Additionally, the overall recoveries were in the range from 94.30 to 104.75% with RSD values less than 5.60% for all analytes. The slope ratio values of the matrix curves to the neat solution curve were between 0.91 and 1.06, implying that there were no significant ion suppression or enhancement between the 28 target compounds under current conditions. These results demonstrated that the developed UPLC-MS/MS method was sensitive, repeatable, and accurate for the determination of the 28 nucleobases, nucleosides, nucleotides and amino acids in nine different cultivation regions of Chrysanthemi Flos.

Distribution of 28 Target Analytes in Chrysanthemi Flos From Nine Different Geographical Origins
The UPLC-MS/MS method developed in this study was used for the simultaneous quantification of 2 nucleobases (1-2), 9 nucleosides (3-11), 2 nucleotides (12-13) and 15 amino acids (14-28) in the Chrysanthemi Flos samples collected from nine different geographical origins. Among the 15 amino acids measured, 8 are essential amino acids (14-21), 6 are non-essential amino acids (22-27), and 1is non-protein amino acid (28). Table 2 shows the contents of each target compound (1-28) and the total concentrations of the analyzed compounds in nine different cultivation regions of Chrysanthemi Flos. The results demonstrated that almost all of the Chrysanthemi Flos samples from different geographical areas were rich in 28 nutrient compositions including nucleobases, nucleosides, nucleotides and amino acids, but they had obvious differences in the contents. The total contents of the 28 target compounds in Chrysanthemi Flos samples from nine regions varied from 3.19 mg/ g in TXHBJ to 27.43 mg/ g in CJ, and they were ranked in the order of CJ > QJ > GJ > SYHBJ > BJ > JJ > HBJ > FBJ > TXHBJ. In addition, the total contents of 15 amino acids in Chrysanthemi Flos from nine different geographical origins were much higher than that of 13 nucleobases, nucleosides and nucleotides, and the former was about 2.10-17.41 times that of the latter. The total contents of 13 nucleobases, nucleosides and nucleotides were relatively higher in FBJ with the content of 1.58 mg/ g compared to the other geographical origins of Chrysanthemi Flos. In contrast, the total contents of 15 amino acids were relatively higher in CJ with the content of 25.94 mg/ g compared with the other geographical origins of Chrysanthemi Flos. Notably, the total contents of six non-essential amino acids in nine different geographical origins of Chrysanthemi Flos were significantly higher than that of eight essential amino acids, and the former was approximately 3.58-7.21 times that of the latter. To sum up, from nutritional and functional points of view, Chrysanthemi Flos from different geographical origins might be promising natural sources for the manufacture of different functional products such as tea and other beverages according to the characteristics of their nutritional ingredients.
In terms of individual compounds, the remarkable differences were also observed in nine different geographical origins of Chrysanthemi Flos. The content of uridine was evidently higher than those of other nucleobases, nucleosides and nucleotides in BJ, CJ, GJ, HBJ, and QJ. The content of adenosine in SYHBJ, TXHBJ, and FBJ was obviously greater than those of other nucleobases, nucleosides and nucleotides. Furthermore, the concentration of cGMP in JJ was observed to be relatively higher than those of all the other nucleobases, nucleosides and nucleotides. For free amino acids, the contents of three of the non-essential amino acids (proline, glutamine and asparagine) were greater than 0.1% in most geographical origins of Chrysanthemi Flos, making them notably more abundant than any other amino acids. More interestingly, the content of glutamate, known as an important umami amino acid, was fairly higher (greater than 0.1%) in SYHBJ, FBJ, and JJ compared to the other geographical origins of Chrysanthemi Flos. The above results indicated that the differences in the distribution of these nutrient compositions might be the one of the reasons for the differences in reputed flavor between Chrysanthemi Flos from nine different regions.

Pearson Correlation Analysis of the 28 Target Analytes and Elevation in Different Geographical Origins of Chrysanthemi Flos
Pearson correlation analysis was performed on the 28 target compounds and elevation for Chrysanthemi Flos from nine different geographical origins by the R package "corrplot" (Wei and Simko, 2017). As illustrated in Figure 3, the warm and cold colours represent positive and negative correlations, respectively, and the deep colours and big circles represent large correlation coefficients and strong correlations. Strong positive correlations were observed among 2 nucleobases (uracil and guanine), 5 nucleosides (2′-deoxythymidine, uridine, 2′deoxycytidine, 2′-deoxyguanosine and cytidine), 8 essential amino acids, 5 non-essential amino acids (proline, tyrosine, glutamine, asparagine, and arginine) and 1 non-protein amino acid (GABA), and almost all Pearson correlation coefficients were greater than 0.6. The above results indicated that the accumulation rules of these nutrients (especially amino acids) were similar. Furthermore, adenosine, cAMP and cGMP were negatively correlated with most of the target compounds investigated in this study, and the absolute values of almost all Pearson correlation coefficients were greater than 0.6. It was worthy noted that the altitude of Chrysanthemi Flos from different geographical origins varied greatly. Among them, SYHBJ was located at the lowest altitude with an elevation of 2 m, and GJ was located at the highest altitude with an elevation of 154 m. Correlation analysis results demonstrated that the elevation was negatively correlated with cAMP and cGMP, and positively correlated with uracil, uridine, cytidine, tyrosine, and GABA, but the Pearson correlation coefficients were all less than 0.6. Therefore, altitude might not be the main factor leading to the differences in nutritional components of Chrysanthemi Flos from different geographical origins, which required further investigation.

Discovery of Quality Markers for Nine Different Geographical Origins of Chrysanthemi Flos
Multivariate statistical analysis, including PCA, and PLS-DA, was carried out based on the contents of the 28 target analytes to clarify the intrinsic similarities and differences of Chrysanthemi Flos from nine different geographical origins and further screen the potential quality markers between them. PCA, an unsupervised model, was first used to explore the underlying trends in the Chrysanthemi Flos samples from different cultivation regions after Pareto scaling with mean-centering. According to default cross-validation configurations (7-fold cross-validation), ten principal components (PCs) were significant for the PCA model. The first two principal components dominated the PCA model and accounted for 78.0% of the variation, indicating good fitness of the established PCA model. As seen in Supplementary Figure 4, ninety Chrysanthemi Flos samples were well segregated into nine groups corresponding to their cultivation regions. The results indicated that nucleobases, nucleosides, nucleotides and amino acids could be used to distinguish Chrysanthemi Flos from nine different geographical origins and had the potential to control the quality of Chrysanthemi Flos.
Then, a supervised PLS-DA model was built to further discover the potential quality markers for nine different geographical origins of Chrysanthemi Flos using the same data as the above mentioned PCA. As shown in the two-dimensional PLS-DA score plot (Figure 4), the Chrysanthemi Flos samples in nine different geographical areas were clearly classified into nine clusters, scattering in different quadrants of the 95% Hotelling's T 2 ellipse, which was similar with the PCA results. Normally, the performance of a PLS-DA model was evaluated by simultaneously considering the explained variation R 2 Y (goodness of fit) and the predicted variation Q 2 (goodness of prediction) (Chang et al., 2017;Chang et al., 2021). Nine principal components were significant for the PLS-DA and the model fit parameters were 0.928 for R 2 Y and 0.915 for Q 2 according to 7fold cross-validation, revealing excellent classification and predictive abilities of the PLS-DA model. Moreover, the permutation test was a method to evaluate the statistical significance of the PLS-DA model, which was subsequently used to further verify the established PLS-DA model. In our 200 rounds of random permutation test (Supplementary  August 2021 | Volume 9 | Article 689254 Figure 5), the R 2 and Q 2 intercepts were 0.042 and −0.381, respectively, indicating that the PLS-DA model was statistically valid and was not overfitted. Next, the potential quality markers for nine different geographical origins of Chrysanthemi Flos were selected according to the VIP plot from the PLS-DA model. Based on the VIP plot, shown in Figure 5, eight target analytes with a VIP value of >1.0, including 2′-deoxyadenosine, guanosine, cAMP, cGMP, arginine, proline, glutamate, and tryptophan according to the order of their VIP values from large to small, mainly contributed to the differentiation of Chrysanthemi Flos from nine different geographical origins and were considered as quality markers. All these eight quality markers exhibited statistically significant differences among the nine different geographical origins of Chrysanthemi Flos based on one-way ANOVA. Moreover, a hierarchical clustering analysis heat map ( Figure 6) was used to intuitively display the distribution of eight quality markers in the nine different geographical origins of Chrysanthemi Flos.
To test the capability of the selected eight quality markers for the discrimination of geographical origins of Chrysanthemi Flos, a PLS-DA analysis was performed again on Chrysanthemi Flos samples from nine different cultivation regions based on these eight quality markers. The model parameters were 0.782 for R 2 Y, 0.765 for Q 2 , 0.008 for R 2 intercept, and −0.251 for Q 2 intercept, which means that the PLS-DA model has good fitness and prediction without overfitting (Supplementary Figure 6). As observed in Supplementary Figure 6A, ninety Chrysanthemi Flos samples were significantly and accurately classified into nine groups in terms of their geographical origins. Notably, the SYHBJ and JJ samples were also clearly separated and no longer overlapped. The results demonstrated that the screened eight nutritional compounds were robust quality markers that can clearly discriminate Chrysanthemi Flos from different cultivation regions.

SVM for the Classification and Prediction of Chrysanthemi Flos in Different Geographical Origins
SVM is a supervised learning model with excellent learning performance and is especially suitable for analyzing small sample data sets (Ma et al., 2016;Li et al., 2017). At present, as an effective tool, SVM has been successfully applied to the quality control of foods and Chinese herbal medicines with satisfactory classification and prediction accuracy Richter et al., 2019;Jin et al., 2021). Therefore, SVM model for classification and prediction of geographical origins of Chrysanthemi Flos in this study was established using the dataset containing the contents of eight quality markers as the input vectors and the nine different regions as the output vectors. The radial basis function was selected as the kernel function to build the SVM model . At the same time, 10-fold cross-validation was performed to improve prediction accuracy and avoid over-fitting of the established SVM model (Gao et al., 2012;Wang et al., 2017). All ninety Chrysanthemi Flos samples from nine different regions were randomly classified into the calibration set (63 samples for developing the discrimination model) and prediction set (27 samples for testing the prediction accuracy of the established SVM model) using the Kennard-stone algorithm. The best combination for penalty parameter C and kernel function parameter g (gamma) of the SVM model was calculated using grid search method combined with 10-fold crossvalidation . As shown in Figure 7, the optimal parameter combination was obtained: Best c 9.5367e −7 , Best g 9.5367e −7 and CV accuracy 100%, indicating that the established SVM model was not over-fitting Li et al., 2020). As illustrated in Table 3, all Chrysanthemi Flos samples were correctly divided into nine groups corresponding to their geographical origins and the prediction accuracy reached 100%. Thus, the results strongly indicated that the developed SVM model based on eight quality markers was a powerful tool for the classification and prediction of geographical origins of Chrysanthemi Flos, further demonstrating the reliability of the screened quality markers.

CONCLUSION
From the perspective of nutritional ingredients, this study proposed a research strategy for screening quality markers of Chrysanthemi Flos from nine different geographical origins using UPLC-MS/MS combined with multivariate statistical analysis. A UPLC-MS/MS method capable of quantifying 28 nucleobases, nucleosides, nucleotides, and amino acids was first established, and then applied it to study the variation trend of these nutrient compositions in nine different regions of Chrysanthemi Flos. The results revealed that Chrysanthemi Flos from nine different cultivation regions were rich in nucleobases, nucleosides, nucleotides and amino acids and their contents were significantly different, but altitude was not the main reason for these differences. Next, eight quality markers for nine different geographical origins of Chrysanthemi Flos were discovered for the first time based on the above 28 nutritional compounds using PLS-DA and VIP plot. Furthermore, a heat map visualization was performed to clarify the distribution of eight quality markers. More importantly, the established SVM model based on eight quality markers showed excellent classification and prediction performance for Chrysanthemi Flos in different geographical origins. The proposed approach was helpful in elaborating more specific quality evaluation standards for Chrysanthemi Flos and provided a simple and reliable method for discovery of quality markers for other TCMs and foods.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.