A Mass Spectrometry Based Metabolite Profiling Workflow for Selecting Abundant Specific Markers and Their Structurally Related Multi-Component Signatures in Traditional Chinese Medicine Multi‐Herb Formulae

In Traditional Chinese Medicine (TCM), herbal preparations often consist of a mixture of herbs. Their quality control is challenging because every single herb contains hundreds of components (secondary metabolites). A typical 10 herb TCM formula was selected to develop an innovative strategy for its comprehensive chemical characterization and to study the specific contribution of each herb to the formula in an exploratory manner. Metabolite profiling of the TCM formula and the extract of each single herb were acquired with liquid chromatography coupled to high-resolution mass spectrometry for qualitative analyses, and to evaporative light scattering detection (ELSD) for semi-quantitative evaluation. The acquired data were organized as a feature-based molecular network (FBMN) which provided a comprehensive view of all types of secondary metabolites and their occurrence in the formula and all single herbs. These features were annotated by combining MS/MS-based in silico spectral match, manual evaluation of the structural consistency in the FBMN clusters, and taxonomy information. ELSD detection was used as a filter to select the most abundant features. At least one marker per herb was highlighted based on its specificity and abundance. A single large-scale fractionation from the enriched formula enabled the isolation and formal identification of most of them. The obtained markers allowed an improved annotation of associated features by manually propagating this information through the FBMN. These data were incorporated in the high-resolution metabolite profiling of the formula, which highlighted specific series of related components to each individual herb markers. These series of components, named multi-component signatures, may serve to improve the traceability of each herb in the formula. Altogether, the strategy provided highly informative compositional data of the TCM formula and detailed visualizations of the contribution of each herb by FBMN, filtered feature maps, and reconstituted chromatogram traces of all components linked to each specific marker. This comprehensive MS-based analytical workflow allowed a generic and unbiased selection of specific and abundant markers and the identification of multiple related sub-markers. This exploratory approach could serve as a starting point to develop more simple and targeted quality control methods with adapted marker specificity selection criteria to given TCM formula.


Visualization of the contribution of each herb to the formula and ELSD filtering
These are additional figures for sections 2.2. and 2.3.  (SO)), with ELSD peak numbering corresponding to subsequent annotations and identifications. Detailed annotations of the specific and abundant components

Annotation workflow
For the features selected by ELSD filtering, their ISDB annotations obtained in the FBMN were verified and completed in detail. The annotation strategy combined MF assignment with taxonomic data since most of the constituting herbs were rather well documented from a phytochemical point of view.
HRMS/MS and UV spectra were used for checking annotation consistency (Main text Fig. 1.6 and Suppl. Mat. Fig.5). The number of potential structures for the MF retrieved from HRMS data were reduced by applying a taxonomic filter (from the species to the family taxa level) (Suppl. Mat. Fig. 5).
The annotation consistency was checked between the hits obtained by MF assignment from HRMS and filtering by taxonomy and ISDB spectral scoring (Top 6 hits) (Suppl. Mat. Fig. 5). When necessary, UV spectra were used as orthogonal information to confirm or discriminate the classes of compounds.
Finally, a comparison with the markers referenced in the European and Chinese Pharmacopoeias was performed.
Three detailed examples of this annotation process are presented in Suppl. Mat. Fig. 6 to 8.
The annotation presented below were obtained from a first batch of metabolites profiling (data not shown). This workflow resulted in the thorough annotation of 22 potential markers among the features selected by ELSD, which included between 2 to 4 potential markers per herb, at the exception of Angelica sinensis for which no ELSD peak were detected. The combination of MF assignment, taxonomic filter, MS/MS scoring and orthogonal UV check provided in half of the cases a single highly probable structure, and in the other half up to 4 putative structures belonging to the same chemical class of components (Suppl. Mat., Table S1). Interestingly, the results of the taxonomic filtering and ISDB annotations were consistent in 80% of cases. Furthermore, orthogonal control by UV spectra permitted the discrimination of all annotated structures belonging to more than one class of compounds even after taxonomic filtering. Finally, the verification of the annotations against the Pharmacopoeias permitted to discriminate between 2 annotations and to modify a case. between them and to select the official marker, astilbin, which is not referenced in the DNP for S.glabra.

Summary of annotations
Supplementary  (Sumner et al., 2007). d PI: positive ionization; e NI: negative ionization; f CI: cluster index; g : identification number in MZmine and Cytoscape.  proposed in (Sumner et al., 2007). d PI: positive ionization; e NI: negative ionization; f CI: cluster index; g : identification number in MZmine and Cytoscape; h CRC number, structure available in (DNP, 2019b); i ND-F not detected in the formula.    (Sumner et al., 2007). d PI: positive ionization; e NI: negative ionization; f CI: cluster index; g : identification number in MZmine and Cytoscape; i ND-F not detected in the formula.  (Sumner et al., 2007). d PI: positive ionization; e NI: negative ionization; f CI: cluster index; g : identification number in MZmine and Cytoscape; i ND-F not detected in the formula.  (Sumner et al., 2007). d PI: positive ionization; e NI: negative ionization; f CI: cluster index; g : identification number in MZmine and Cytoscape; h CRC number, structure available in (DNP, 2019b); i ND-F not detected in the formula. 36 cluster with potential co-markers for A.sinensis, F) cluster with potential co-markers for S.flavescens, the color tones of the nodes indicate the features with the same retention time. The numbers indicate the m/z ratio of the nodes, followed by their retention time See table S3, S5, S11 and S12 for related annotation. G) FBMN. See  See table S4, S8, S8 and S9 for related annotation. See Fig. 4D for the node layout legends.

UHPLC-UV-PDA HRMS/MS data acquisition
The analyses were performed on an Acquity UPLC system interfaced to an Orbitrap Q-Exactive Focus mass spectrometer (Thermo Scientific) using a heated electrospray ionization (HESI-II) source and an Acquity UPLC