Screening of Combinatorial Quality Markers for Natural Products by Metabolomics Coupled With Chemometrics. A Case Study on Pollen Typhae

Natural products, especially for traditional Chinese medicines (TCMs), are of great importance to cure diseases. Yet it was hard to screen the influential quality markers for monitoring the quality. A simple and comprehensive strategy was developed and validated to screen for the combinatorial quality markers for precise quality evaluation and discrimination of natural products. In this study, Pollen Typhae (PT) and it's processed products carbonized PT were selected as the representative case. Firstly, metabolomics data of 49 batches crude PT and carbonized PT was obtained by ultra high-performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry (UHPLC-Q-TOF/MS). Then, metabolomics approaches were performed to screen for the potential markers that lead to the quality difference. Finally, chemometric methods were used to validate the accuracy of combinatorial quality markers. Thus, 42 compounds were identified from PT, 5 markers (isorhamnetin-3-O-(2G-α-L-rhamnosyl)-rutinoside, isorhamnetin-3-O-neohesperidoside, astragalin, kaempferol and umbelliferone) were successfully screened, identified, quantified and regarded as combinatorial quality markers for precise quality evaluation of crude and carbonized PT. It was demonstrated that the established comprehensively strategy provide an efficient tool for precise quality evaluation of natural products from the whole.


INTRODUCTION
Currently, natural products including traditional Chinese medicines (TCMs) gained extensive attention, and be used to cure diseases due to their valuable biological effects including antioxidant, anticancer or antimicrobial activity (Uysal et al., 2017). In contrast to synthetic medicine, many Chinese herbs are commonly subjected to different processing procedures, such as stir-frying, steaming, baking and braising (Wang et al., 2017). Even though TCMs and their related processed products have the same herb source, chemical constituents, pharmaceutical activity and clinical applications of TCMs may vary from different processing procedures . For instance, the wine-processed Radix Scutellariae was more effective in cleaning the lung fire, the upper-energizer heat and humidity with the aid of wine characteristic than the crude Radix Scutellariae (Cui et al., 2014). Nowadays, the crude TCMs and their processed products share the same quality specification. It is worth noting that just few constituents of TCMs were regarded as markers for quality evaluation on the basis of Chinese Pharmacopoeia (Wang et al., 2017). However, TCMs were well-known for "multi-chemical components." Few constituents could not replace all constituents to be the markers for quality evaluation (Yang et al., 2017). Thus, for the sake of precise quality evaluation of crude herbs and its processed products, it is vital to screen for the quality markers according to their effectiveness, practicability and safety.
Typhae Pollen (PT), namely Puhuang, the dry pollen of Typhaceae plants (Typha angustifolia L., Typhae orientalis Pres or the plants of same genus), was originally recorded in the "Shen Nong Ben Cao Jing." It was commonly used as herbal medicine to treat dysmenorrhea, stranguria, stroke and angina pectoris (Yu et al., 2017a). The main chemical compositions of PT were flavonoids, sterols, long-chain hydrocarbons, amino acids, organic acids and so on (Qin and Sun, 2005). The pharmacological research indicated that these characteristic constituents have the anti-oxidant, anti-inflammatory, antigenotoxic, anti-protozoal (Lijun et al., 1998), lowering serum cholesterol (Jia et al., 1990), procoagulantion, anticoagulantion (Gibbs et al., 1983) and hemostasis activities (Chen et al., 2015). PT has been traditionally processed by carbonizing. Modern pharmacological studies have proved that the carbonized PT is more effective in hemostasis . The reason may be that some certain compounds are enhanced or reduced during processing procedure of PT (Zhou et al., 2016). At present, crude PT and carbonized PT are of commercially availability in the herbal markets. Unfortunately, there is no exact standard to carry out to distinguish PT and carbonized PT. According to Chinese Pharmacopoeia (2015 Version), the total content of isorhamnetin-3-O-(2 G -α-L-rhamnosyl)rutinoside and isorhamnetin-3-O-neohesperidoside (not less than 0.5%) is officially set as acceptance criterion for both crude and carbonized PT, which may not be specific and meaningful. Thus, it is urgent to develop a method to discover discriminatory quality markers of PT and carbonized PT.
Chemometrics is a discipline related to the application of mathematics, statistics and computer science (Lavine and Workman, 2008). The main missions of the chemometrics are manipulate and analyze the chemical data, design and choose the optimum measurement procedures to obtain the chemical information to the maximum extent (Kumar et al., 2014). Random forest (RF) and Adaptive boosting algorithm (AdaBoost) are commonly used chemometrics methods, which have been considered as the valuable instrument for data classification and accuracy forecast in application of natural products (Gotoh et al., 2017;Xia et al., 2017). At present, ultra high-performance liquid chromatography tandem mass spectrometry (UHPLC-MS), high performance liquid chromatography tandem mass spectrometry (HPLC-MS) and gas chromatography-mass (GC-MS) have been widely used in the analysis of metabolomics researches (Zhao et al., 2012a,b). Among the above analytical techniques, UHPLC-Q-TOF/MS could provide information more rapidly and efficiently, and allow the wide application for quantitative and qualitative analysis with high selectivity and sensitivity (Yu et al., 2017b;Mocan et al., 2018). Thus, the use of UHPLC-Q-TOF/MS is a powerful method which promotes the development of the field of natural products research, especially for TCMs.
The concept of quality markers for quality evaluation of traditional Chinese medicine was first proposed by Changxiao Liu (Yang et al., 2017). In this study, the metabolomics strategy of UHPLC-Q-TOF/MS coupled with chemometrics method was established for discovering the combinational quality markers of natural products. Metabolomics analysis was selected to discover chemical markers which could represent the quality of natural products. Chemometrics models were used to establish a new method for validating the accuracies of the screened markers in natural products. The strategy of the research is summarized in Figure 1. The samples from different producing areas were collected and classified. Then, the chemical constituents were identified by UHPLC-Q-TOF/MS method. Subsequently, the potential markers were screened by metabolomics analysis. Complying with three rules, the final combinatorial markers were identified and screened. First and second rules are that certain potential markers are quantified easily and obtained commercially, respectively. Third rule is that the potential markers could represent the whole chemical information of natural products with high accuracy. Based on the above three rules, these screened combinatorial quality markers were used to establish the quality specification and evaluate the quality of the natural products. Pollen Typhae (PT) was selected as an example. A simple and efficient metabolomics strategy coupled with chemometric was first established and validated to screen for and identify the discriminatory combinatorial quality markers of natural products.

Plant Material
A total of 49 batches of Pollen Typhae and carbonized Pollen Typhae (P1-P32 and VP1-VP12 for crude PT, CP1-CP17, and VCP13-VCP22 for carbonized PT) were purchased from different regions of China and authenticated by Dr. Yan-xu Chang (Tianjin University of Traditional Chinese Medicine).

Preparation of Sample Solutions and Reference Standards
Preparation of sample solution: 0.500 g of well mixed PT sample powder was accurately weighed and added to 10 mL 70% methanol-water solution. Then the mixture was extracted ultrasonically (40 kHz, 1,200 W power) for 40 min. All the solutions were centrifuged at 14000 rpm and filtered through a 0.22 µm filter membrane before analysis. All solutions were stored at 4 • C until use.

UHPLC-Q-TOF/MS Analysis
Agilent 1290 UPLC system (Agilent Technologies, Waldbronn, Germany) connected to Agilent 6520 Q-TOF mass spectrometer (Agilent Corporation, Santa Clara, CA, USA) via an ESI interface was performed to identify the chemical components in crude PT extract. The gradient program of mobile phase was the same as UHPLC-PDA condition. The optimum operating parameters of Q-TOF/MS were set as follows: drying gas, N 2 ; flow rate, 0.3 mL/min; drying gas temperature, 350 • C; nebulizer gas pressure, 40 psig; capillary voltage, 3500 V; fragmentor voltage, 120 V; skimmer voltage, 65 V; octopole RF, 750 V; collision energy (CE), 30 and 40 V. Both in positive and negative ion modes, [M-H] − and [M+H] + were used to select precursor ions that were subjected to MS/MS analysis. The detection range was m/z 50-1,500 and the UV wavelength was 210-400 nm.

Validation of the Qualitative Method
Validation experiments were performed to investigate the precision, repeatability and stability. P1 sample solution was used for the method validation. The precision was assessed by one sample with six replicate injections. The repeatability of the method was investigated by using six replicate solutions. The stability of those analytes was assessed by analysing the solution at 0, 2, 4, 6, 8, 12, and 24 h. The validation was expressed as the relative standard deviation (RSD).

Validation of the Quantitative Method Calibration curve, limits of detection, and limits of quantification
For the calibration curves, each of the standard stock solutions 5 quantitative compounds was diluted to a different concentration, and each concentration was measured in triplicate. Calibration curves were established by plotting the peak area vs. the concentration of the corresponding analyte solution. The repeatability was tested by six independent samples and expressed as the RSD. The limits of detection (LODs) and limits of quantification (LOQs) were further diluted by the lowest concentration of the mixed standard stock solution to a certain concentration, which were evaluated at a signal-to-noise (S/N) ratio of 3 and 10, respectively.

Precision, stability, repeatability, and recovery
The precision of the method was evaluated by intra-and interday variations with the six replicates within one day and a repeat of the same steps for three consecutive days, respectively. Six similar samples were extracted and analyzed under the same conditions to measure the repeatability. The sample solution was carried out by 6 replicate injections at 0, 2, 4, 6, 8, 10, 12, 24 h to obtain the stability of the sample. The recovery test was performed by adding the mixed 5 standard solutions to the PT extract samples. All results were evaluated by RSD.

Data Analysis
The metabolomics data in the negative-ionization mode of UHPLC-Q-TOF/MS was imported into the XCMS software operating on the R+ package (R Foundation for Statistical Computing, Vienna, Austria). All detected peaks were tabulated applying tR-m/z pairs and then outputted for statistical analyses. Then, the data of preliminary screening was subjected to partial least-squares discriminant analysis (PLS-DA) by Simca-P (version 14.1, Umetrics, Umea, Sweden). Two algorithms were employed to calculate the accuracy of screened markers by Matlab R2015B (Mathworks, Natick, USA). Finally, the selected combinatorial quality markers were subjected to the Fisher ′ s discriminant analysis (FDA) for the establishment of the discriminant function by 49 PT samples (P1-P32, CP1-CP17). The FDA was done in SPSS version 19.0 (SPSS, Chicago, IL, USA).

Sample Extraction Condition Optimization
Single factor and orthogonal experiments were effective tools for optimizing the extraction condition in natural products (Cai et al., 2011), which was applied in this experiment to optimize the PT sample extraction condition. The total contents of isorhamnetin-3-O-(2 G -α-L-rhamnosyl)-rutinoside, isorhamnetin-3-O-neohesperidoside, astragalin, kaempferol and umbelliferone were chosen as the indexes. Single factor tests were performed to obtain a reasonable range of data for the orthogonal experiment. A total of 0.25, 0.5, and 0.75 g PT was added to a 10 mL 70% methanol-water solution, then sonicated (40 kHz, 1,200 W power) at room temperature for 30 min. The results showed that there was higher extraction efficiency by using 0.5 g/10 mL sample/solvent ratio. A total of 0.5 g PT was added to a 10 mL 70% methanol-water solution and sonicated for 30, 45, and 60 min (40 kHz, 1,200 W power). The results indicated that 50 min ultrasonication time produced desirable extraction efficiency. 70, 80, and 90% methanol-water solution was also investigated. 80% methanol-water solution was selected because of the relatively high value obtained. Based on the above results of single factor tests, key factors such as methanol concentration (70, 80, and 90%), sample/solvent ratios (0.25, 0.5, and 0.75 g/10 mL) and ultrasonication time (30, 40, and 50 min) were tested by using an orthogonal L9 (3 4 ) experiment. The Frontiers in Pharmacology | www.frontiersin.org results indicated that the sample/solvent ratio at 0.5 g/10 mL, 70% methanol-water solution and 40 min extraction time produced desirable extraction efficiency.

Optimization of Chromatographic Conditions
In order to achieve appropriate retention time and optimum resolution of target compounds in natural products, the related factors including the type chromatographic column, the concentration of mobile phase composition, column temperatures, flow rates, and detection wavelengths were investigated. In our UHPLC-PDA analysis method, isorhamnetin-3-O-(2 G -α-L-rhamnosyl)-rutinoside, isorhamnetin-3-O-neohesperidoside, astragalin, kaempferol and umbelliferone were the target compounds. C 18 column types (2.1 × 100 mm, 1.7 µm, Waters; 2.1 × 150 mm, 1.7 µm, Waters) and mobile phase's acetonitrile with different aqueous phase (water of 0.1% formic acid or ultra-pure water) were optimized. When the mobile phase was 0.1% aqueous formic acid solution and acetonitrile with C 18 column (2.1 × 150 mm, 1.7 µm, Waters), the peaks were of high resolution and the peak shapes of the compounds were optimum. In this experiment, PDA detection was used to scan the wavelengths of 200-400 nm, which demonstrated that the 5 compounds all had absorption peaks at 280 nm. The perfect performances were obtained at the flow rate of 0.3 mL/min and column temperature of 35 • C. The UHPLC-Q-TOF/MS and UHPLC-PDA chromatograms were shown in Figures 2, 3.

Validation of the Qualitative Method
To validate the developed UHPLC-Q-TOF/MS method, the precision, repeatability and stability of the 11 screened qualitative markers were investigated. The relative standard deviations (RSDs) of precision obtained were less than 4.76%, showing that the method was reliable for the qualitative analysis. The results of the repeatability of the analytes were less than 4.85%, it was demonstrated that the qualitative method was reproducible. The RSDs of the stability were no more than 4.75%, demonstrating the sample solutions were stable for 24 h at room temperature (Table  S1). The above results verified that the UHPLC-Q-TOF/MS method could be used for the qualitative investigation of crude PT extract.

Validation of the Quantitative Method
Under the optimum chromatographic conditions, the correlation coefficient values of 5 quantitative components were more than 0.9991. The LOD and LOQ for 5 quantitative components ranged from 0.02 to 0.25 µg/mL and 0.05 to 0.5 µg/mL, respectively. Repeatability was investigated by analysing 6 independent samples. The results indicated that RSD values of repeatability were less than 3.69%. The intra-day and inter-days accuracies for 5 compounds at three levels (low, medium and high) were 86.1-112% and 86.7-112%, respectively. The RSD values of the intra-day and inter-days precisions were 0.19-2.2% and 0.15-3.2%, respectively (Table S2). The remains were 95.8-114% and their RSD values were less than 2.1%, which indicated that the method was of good stability (Table S3). Recovery was performed at low, medium and high levels (accurate adding 80, 100, and 120% of known amount of 5 mixed standards to the analyzed sample of PT). The recovery values of 5 compounds were in the range of 95.8-104% (Table S4), the results demonstrated that the established method was accurate and reliable for the determination of the 5 markers in PT.

Compound Identification in Crude PT Extract
For the purpose of subsequent screening of discriminatory and combinatorial quality markers in natural products, the study of material basic is very important. Thus, UHPLC-Q-TOF/MS was employed to identify the chemical compounds of natural products. In this study, both positive and negative ion modes were used to obtain more comprehensive information in crude PT extract. The total ion chromatograms (TIC) are shown in Figure 2. As shown in Table 1, 42 compounds were identified or tentatively characterized according to previous reports.

Identification of Amino Acids
Amino acids were commonly known as a class of organic compounds that contain amino and carboxyl groups. HCOOH (46 Da) and NH 3 (17 Da) moiety were easily generated due to this structure. High polarity determines amino acid has short retention time on the reversed phase column. Benefit from these features, 3 amino acids (Peaks 1, 3, and 5) were detected in positive ion mode ( Figure S1A; Kivrak et al., 2014).

Identification of Organic Acids
Organic acids play an indispensable role in the treatment of diseases. For example, phenolic acids have the antibacterial and anti-inflammatory effects. Some polyunsaturated fatty acids have the effects of lowering blood lipid and preventing cardiovascular disease (Simopoulos, 1991). Therefore, the identification of this kind of compounds will play a pivotal role in elucidating the mechanism of PT. According to the previous research, organic acids showed good MS response in negative ion mode rather than positive ion mode with the characteristic fragment ions Frontiers in Pharmacology | www.frontiersin.org  such as CO 2 (44 Da), CO (28 Da), and H 2 O (18 Da), which helped rapidly identify compounds ( Figure S1B; Liu et al., 2016). Based on the cracking rules deduced above, 6 small phenolic acids (Peaks 2, 4, 7, 9, 10, 13) were tentatively identified with low molecular weight, relatively short retention time and characteristic neutral loss. Peak 2 displayed the molecular formula C 6 H 8 O 7 at m/z 191 [M-H] − . Its characteristic product ions at m/z 111 in MS 2 spectrum were consistent with the sequential losses of 2H 2 O (36 Da) and CO 2 (44 Da). By further confirmed the chromatographic and mass spectral information with the reference literature, the compound was easily identified as citric acid (Geng et al., 2014). Peaks 4, 9, and 13 yielded [M-H] − ions at m/z 117, 121, and 163 respectively. They gave the expected fragments at m/z 73, 77, and 119 individually, which were corresponding to the elimination of CO 2 moiety. Peak 7 displayed [M-H] − ions at m/z 137, and its chemical position deduced as C 7 H 6 O 3 , the MS 2 ions at m/z 119, 108, which suggested a successive loss of two H 2 O moiety from the parent ion. Further compared their retention behaviors, precise molecular weights and MS/MS spectra information with reported data, peaks 4, 7, 9 and 13 were identified as succinic acid, protocatechuic aldehyde, benzoic acid and ρ-Coumaric acid respectively (Penner et al., 2010;Fan et al., 2011;Xie et al., 2011). Peak 10 exhibited an abundant parent ion [M-H] − at m/z 335, and fragment ions at m/z 191 and m/z 179, which was implies for the presence of quinic acid and caffeic acid moieties. By comparing their chromatographic retention times, mass spectral information and molecular weights with reference data, peak 10 was tentatively identified as caffeoylshikimic acid ( Figure S1B) (Abu-Reidah et al., 2013;García-Salas et al., 2014). In addition, 17 fatty acids (peak 20, 24, 25, 27, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, and 42) were also tentatively identified by comparing with literature data (Ma et al., 2006;Cho et al., 2012;Liu et al., 2012;Ferreiro-Vera et al., 2013).

Identification of Flavonoids
Flavonoids are considered to be the main bioactive components in PT (Tao et al., 2011). The identification of flavonoids was divided into two steps: identification of flavonoid aglycones and flavonoid glycosides. Flavonoid often exhibit Retro-Diels-Alder (RDA) reactions which are the main feature of the cracking of flavonoids. The C-ring of flavonoids fracture into two parts, i,j A − and i,j B − represent the two parts of the cleavage. i and j are respectively expressed as the position. 1,3 A − ion as diagnostic ions are frequently appeared in the cleavage fragments of flavonoids. In addition, a few small molecule neutral losses, such as CO (28 Da), CO 2 (44 Da) and C 2 H 2 O (42 Da) and H 2 O (18 Da), were apt to be formed in consequence of RDA reactions. While the flavonoid glycosides have sugar substituents, which was divided into the fracture of sugar substituents and flavonoid aglycones. Generally, sugar substituents were gradually loss, leaving aglycone ion [Y − 0 ] or together with radical aglycone anion ([Y 0 -H] − · ) followed by RDA reactions (Ma et al., 2015). The cleavage sequence could help determine the connect position.
According to the previous research, the flavonoids compounds in PT are mainly quercetin, kaempferol, isorhamnetin and their flavonoid derivatives (Han et al., 2012). The same aglycone ion and/or radical aglycone anion with different abundance at m/z 301/300, m/z 285/284 and m/z 315/314 were usually in negative ion mode after the derivatives of these compounds lose glycosyl groups. However, the compounds that could produce similar ions were not necessarily the same kind of substances. Thus, RDA diagnose fragment ions was needed to further determine and classify the compounds. Glycosyl groups, which attached to the flavonoid aglycones, were usually formed by rhamnosyl (rha, 146 Da), glucosyl (glc, 162 Da) or their combined structure. These described features could be helped for the identification of quercetin, kaempferol, isorhamnetin and their flavonoid derivatives (Ma et al., 2015).
Peak 16 and 23 were detected both in the positive and negative ion mode. Compared retention time, mass data and molecular formula mass accuracy (within 5 ppm) with reference literature, peak 22 and 28 were identified as astragalin and kaempferol (Tao et al., 2011

Identification of Others Types
The cracking of the peaks 6 and 17 is not in conformity with the chemical components described above, thus classified as others. Peak 6 displayed [M+HCOOH] − ions at m/z 387. The MS 2 ions at m/z 179 suggested a glycosyl group cleaved from parent ion. Based on the small molecular weight and the characteristics of glycosyl group, compound 6 was preliminary deduced as glucide, further confirmed by other MS 2 product ions at 161, 119, 113 with reported literature, peak 6 was tentatively identified as sucrose (Wang et al., 2014). Peak 17 was detected at positive ion mode. Compared with authentic standard substance, it was unambiguously identified as umbelliferone( Figure S1D; Avula et al., 2016). All the information was listed in Table 1.

Multivariate Statistical Analysis
Metabonomics analysis was crucial step for screening the combinatorial quality markers in natural products, which was used to screen for the discriminatory markers that could represent the quality of natural products. All chromatographic data of different batches of natural products obtained from UHPLC-Q-TOF/MS was processed by the XCMS software. A three-dimensional data matrices including sample information, variables featured on retention times, m/z value likewise their corresponding intensities were gathered and tabulated (Huang et al., 2013). And then, the P-values of T-tests were generated and the variables with P values less than 0.05 were chose. Furthermore, Partial least squares discriminant analysis (PLS-DA), a supervised multivariate data analysis technique, characterized by variable selection criteria and potential biomarker output, which was employed to select obviously markers from the above variables with P-values less than 0.05. This originates from the truth that it could choose markers counting on variable importance parameters (VIP > 1), then it can be select as markers based on the order of their contributions to the separation of clustering. Thus, the significantly markers with VIP > 1 were selected. In our study, the chromatographic data of P1-P32 and CP1-P17 PT samples was acquired from negative-ionization mode of UHPLC-Q-TOF/MS. A total of 1,091 variables with P-values less than 0.05 were obtained. According to the VIP, 311 markers were selected. It was indicated that 311 markers were considered to be the significantly markers which could distinguish the crude PT and carbonized PT.

Identification of the Final Dominated Combinatorial Markers
To explore the final dominated combinatorial quality markers from the above qualitative compounds, three rules were employed. First and second rules are select easily quantified and commercially obtained compounds, respectively. Third rule is that the potential markers could represent the whole chemical information of natural products with high accuracy. Thus, Random Forest (RF) and Adaptive boosting algorithm (AdaBoost), the off-the-shelf supervised learning model, which were introduced for validating the accuracy of the selected markers of every screening procedure. In every model, the half of the samples were set as training set and the remained batches as testing set at random. The training set was used to establish the model and the values of that were regarded as prediction scores. Then, the established model was verified by forecasting the testing set and the accuracy values were gave. In this study, isorhamnetin-3-O-(2 G -α-L-rhamnosyl)-rutinoside, umbelliferone, kaempferol, isorhamnetin-3-O-neohesperidoside and astragalin were selected from 11 qualitative compounds according to the first and second rules. Then, RF and AdaBoost for categorization and prediction of 49 batches crude PT (P1-P16: training set; P17-P32: testing set) and carbonized PT (CP1-CP8: training set; CP9-CP17: testing set), based on the dataset obtained from the UHPLC-Q/TOF-MS of the 1091, 311, 11 and 5 compounds. The values of accuracy were summarized in Table 2. In every model, the accuracy of 1091, 311, 11, 5 markers were all higher than 90%, It is rather remarkable for the fact that the accuracy of 5 markers were all more than 92%, which approached the accuracy of 1091 variables, Therefore, the 5 markers could replace 1091 variables as combinatorial quality markers for distinguish the crude PT and carbonized PT.

The Quantitative Analysis of Different Batches of PT and Its Processed Product
Quantitative analysis of combinatorial quality markers was a key step in the proposed strategy, which could exhibit the routine variation of natural products and make preparations for establishing the quality specification. The validated UPLC-PDA method was subsequently used to simultaneous determine the 5 combinatorial markers (isorhamnetin-3-O-(2 G -α-Lrhamnosyl)-rutinoside, isorhamnetin-3-O-neohesperidoside, astragalin, kaempferol and umbelliferone) of 34 batches PT samples (P1-P23 are raw materials, CP1-CP11 for processing products). The results were shown in Table 3. The results showed that there were significant differences between the crude and carbonized PT. The contents of isorhamnetin-3-O-(2 G -α-Lrhamnosyl)-rutinoside and Isorhamnetin-3-O-neohesperidoside were obviously reduced were after being carbonized, while the contents of astragalin, kaempferol and umbelliferone were slightly reduced. In summary, the contents of 5 components in raw PT were higher than those in processing PT. Thus, this demonstrated that the processing caused the quality fluctuation among crude and carbonized PT samples and probably changed the pharmacological effects, which need to conduct a deep research on this phenomenon.

Discriminant Analysis
From the perspective of natural products, in particular for crude TCMs and its processing products, the application of discriminant analysis was essential. It could be used to build a predictive model of the group membership based on observed characteristics of the screened combinatorial quality markers of different known samples. It produced a discriminant function. The function could be applied to new samples for rapidly assigning to a group. In our case of PT, a total of 311 markers were selected with VIP > 1 from 1,091 variables, and then 11 markers were identified. Finally, 5 markers were quantified from 11 qualitative markers. Now, 49 batches PT (P1-P32 and CP1-CP17) were prepared with crude PT and carbonized PT, and the PLS-DA model was developed again to validate whether they could be grouped into two parts according to the 1,091, 311, 11 and 5 markers by the dataset obtained from the UHPLC-Q/TOF-MS. Two groups were separated well based on 5 markers in the same way as using 1,091 markers (Figure 4). Therefore, these 5 combinatorial quality markers were subjected to discriminant analysis by SPSS software in order to build predictive model of two groups based on observed features of 49 batches PT (P1-P32 and CP1-CP17). It produced a discriminant function in the unstandardized canonical discriminant analysis model in terms of the predictor variables that offer the distinction among two groups. The discriminant function equation was as follows: γ = −0.000000264573441146107X1 + 0.000000246093281974065X2 + 0.000006260309339768010X3 + 0.000002816986003780560X4 − 0.000000260116442845880X5 − 0.208119936952103 where X1 to X5 represents the MS response of isorhamnetin-3-O-neohesperidoside, isorhamnetin-3-O-(2 G -α-L-rhamnosyl)-rutinoside, umbelliferone, kaempferol and astragalin. Eventually, the classification result showed that 95.9% of originally grouped cases were correctly classified and 93.9% of cross-validation grouped cases were further correctly classified. The above results indicated that the discrimination model was reliable. Therefore, an unknown sample can be rapidly assigned to a group by using the discriminant function. The determination value was calculated to be 0.6435 with the value of average of group centroids at −1.458 and 2.745 divided by 2. It indicated that if the discriminant score of sample was lower than the value, the sample will be similar to crude PT sample, or it would be not. In order to validate the accuracy of this predictive model, 22 batches crude PT and carbonized PT samples (VP1-VP12 and VCP1-VCP10) were distinguished by discriminant function (Table S5). According to the discriminant function, PT6 belonged to Group 2, it may due to the contents of different batches vary by Chinese herb's producing areas and processing methods. Meanwhile, the samples of VP1-VP12 and VCP1-VCP10 were successfully divided into two groups respectively, further indicated that the above model was reliable and accurate.

CONCLUSION
Along with the increasing demand of natural products for medical care and dietary supplement, the quality control of natural products has been pushed in the spotlight and became a critical issue. In this work, a comprehensive metabolomics coupled with chemometrics strategy was first established and validated to identify the discriminatory combinatorial quality markers for natural products. Take crude and carbonized PT as an example. Five markers [astragalin, kaempferol, umbelliferone, isorhamnetin-3-O-neohesperidoside and isorhamnetin-3-O-(2 G -α-L-rhamnosyl)-rutinoside] were successfully screened, identified, quantified and regarded as combinatorial quality markers for precise quality evaluation. It was demonstrated that the proposed comprehensive strategy is not only useful in distinguishing crude PT and carbonized PT, but also provide an efficient tool for screening the combinatorial quality markers for other natural products.