Investigating metformin-active substances from different manufacturing sources by NIR, NMR, high-resolution LC-MS, and chemometric analysis for the prospective classification of legal medicines

Introduction: The characterisation of active substances is an essential tool to ensure the traceability and authenticity of legal medicines. Metformin is a well-established biguanide derivative recommended in oral formulations as a first-line treatment for type 2 diabetes. With its increasing demand, metformin is likely to be an attractive target for falsification and substandard production, thus posing health risks to consumers. Methods that are able to identify even small differences in active pharmaceutical ingredients (APIs) are deemed necessary. The detection of fraudulent practices in APIs is not straightforward, and a single technique that can provide sufficient information to unambiguously address this issue is still not available. Methods: This study investigated an integrated analytical platform based on NIR, 1H-NMR, 13C-NMR, and high-resolution LC-MS combined with chemometrics to profile 32 metformin hydrochloride samples originating from several global authorised manufacturers. The study's aim was to explore differences in the chemical characteristics of metformin hydrochloride APIs to identify or predict a possible classification for each manufacturer in view of prospective authenticity studies. Different pre-processing methods were applied; bucket tables for 1H- and 13C-NMR were obtained, while mass spectrometry data were processed in targeted and untargeted modes. Datasets were individually analysed and merged by a multivariate unsupervised method and performing principal component analysis (PCA). Results and Discussion: The results evidenced differences in cluster behaviour, depending on manufacturers. Each technique has shown a specific clustering tendency, highlighting how different analytical approaches are able to characterise metformin APIs. Some manufacturers’ samples, however, showed similar behaviour independently of the techniques. NIR and 1H-NMR were confirmed as the more predictive techniques if taken individually; 1H-NMR, in particular, achieved good separation between the samples of the two most representative manufacturers. For LC-MS, the targeted approach resulted in a separation in groups clearer than that of the untargeted approach. Nevertheless, the untargeted LC-MS approaches presented in this paper could be a possible alternative to obtaining different information for drug substances, with several different and complex synthetic pathways leading to several unknown impurities. Further grouping of manufacturers emerged by data fusion, highlighting its potential in the traceability of metformin.

the untargeted LC-MS approaches presented in this paper could be a possible alternative to obtaining different information for drug substances, with several different and complex synthetic pathways leading to several unknown impurities. Further grouping of manufacturers emerged by data fusion, highlighting its potential in the traceability of metformin. KEYWORDS APIs, chemometrics, nuclear magnetic resonance, near infrared, liquid chromatography-mass spectrometry, falsification

Introduction
Diabetes mellitus is a chronic metabolic disorder of multiple aetiologies characterized by hyperglycaemia; it affects over 400 million people worldwide (World Health Organization, 2022). Type 2 diabetes (adult-onset or non-insulin-dependent diabetes) accounts for 90% of diagnosed cases of diabetes and represents an increasing threat to public health, with significant mortality and co-morbidities. Metformin is a well-established biguanide derivative ( Figure 1) recommended in oral formulations as a first-line treatment for type 2 diabetes due to its: i) efficacy in controlling blood glucose levels with a low risk of hypoglycaemia, ii) potential use in monotherapy or in combination with other glucose-lowering agents, and iii) low costs of production (Viollet et al., 2012; Rojas and Gomes, 2013;Buse et al., 2020;Giaccari et al., 2021). WHO reported metformin in the model list of oral hypoglycaemic agents considered to be essential medicines, that is, medicines that satisfy the priority healthcare needs of a population selected with due regard to disease prevalence and public health relevance, evidence of efficacy and safety, and comparative cost-effectiveness (World Health Organization, 2021).
With increasing demand, metformin is likely to be an attractive target for falsification and substandard production. This may pose health risks to consumers, considering the high dosage in adults (up to 3 g/day), use in the children (Khokhar et al., 2017), and use in long-term therapy.
This risk could be due to low-quality drugs with unknown impurities or residual solvents sourced from the manufacturing process. Moreover, in 2019, the possible presence of the carcinogenic impurity N-nitrosodimethylamine (NDMA) in metformin products resulted in the introduction of new appropriate control strategies as additional quality requirements for the manufacturers (Keire et al., 2022). On the other hand, the branching of the supply chain increases the risk of the falsification of active pharmaceutical ingredients (APIs); therefore, the traceability of active substances represents an essential albeit demanding necessity (Raimondo et al., 2022).
The fraudulent use of metformin API, as well as other drug substances from different unauthorised manufacturers, cannot be excluded. This kind of fraud is proscribed as pharmaceutical falsification under European Directive 2011/62/UE (European Parliament and the Council of the European Union, 2011). The risks related to this kind of falsification arise from the use of low-cost APIs manufactured under different, uncontrolled, or unapproved processes.
Nevertheless, in many cases an API produced by unapproved manufacturers complies with the official quality controls (e.g., the controls prescribed by the European Pharmacopoeia specific monograph and general chapters). For this reason, methods able to detect even small differences in APIs originating from different manufacturers are deemed necessary.
In recent years, studies of fingerprinting analysis to discover possible falsifications of medicinal products or APIs have been performed (Anzanello et al., 2014;Custers et al., 2014;Acevska et al., 2015;Custers et al., 2016a;Raimondo et al., 2020;Deconinck et al., 2022). The European Official Medicines Control Laboratories (OMCLs) network performed fingerprinting studies on APIs of different manufacturers/origins to develop a tool to detect falsification of the origin of active ingredients (Raimondo et al., 2020;Rebiere et al., 2022).
The detection of fraudulent practices in APIs is not straightforward, and a single technique that can provide sufficient information to unambiguously address this issue is still not available. Recent articles have reported the fingerprinting of APIs of different origins by using a combination of different analytical techniques (Deconinck et al., 2022;Rebiere et al., 2022). Spectroscopy techniques combined with chemometrics are most commonly used for authentication and traceability ( Biancolillo and Marini, 2018;Mees et al., 2018).
NMR spectra provide several kinds of information on the structure of the main molecule and its impurities (Winning et al., 2008;Pacholczyk-Sienicka et al., 2021). Indeed, the number of chemometric studies applied to NMR spectra is rapidly increasing due to the significant results that this analytical technique has demonstrated in the field of pharmaceuticals, food, and plants (Deconinck et al., 2022).
The combination of NMR spectroscopy and multivariate classification approaches has recently been used to identify the fingerprints of pharmaceutical chemical substances (Raimondo et al., 2020;Deconinck et al., 2022;Raimondo et al., 2022) and to detect the origin of biological molecules such as heparin (Colombo et al., 2022).
LC-MS is considered another analytical technique that, combined with chemometric analysis, offers significant information that can detect even slight differences among active substances (Nicolas and Scholz, 1998;Acevska et al., 2015). In this regard, LC-MS quadrupoletime-of-flight (Q-TOF) relies on the chromatographic signal of the ion current or a specific region of chromatograms of trace organic impurities (Deconinck et al., 2008). This technique has usually been applied to identify specific compounds that can be linked to side reactions of the synthetic process of the active substance (Schneider and Wessjohann, 2010;Chen et al., 2022).
This study investigated an innovative integrated analytical platform based on NIR, 1 H-NMR, 13 C-NMR, and LC-MS Q-TOF (with targeted and untargeted approaches in data processing) combined with chemometric tools to profile 32 metformin hydrochloride samples originating from several authorised manufacturers distributed worldwide. The aim of the present study was to explore differences in the chemical characteristics of metformin hydrochloride APIs to identify or predict a possible classification for each manufacturer in view of perspective authenticity studies.

Sample collection
Metformin API samples were collected from Marketing Authorization Holders of medicinal products, upon request of the Italian Medicines Agency during post-marketing surveillance activities on the legal supply chain. Some 32 samples from 11 worldwide manufacturers were collected by the National Authority and sent to the Italian OMCL for analysis, along with the release certificate from the manufacturer. Each sample was identified with a chemometric code (Xn) (see Table 1. Samples of one manufacturer were produced in sites located in two different countries so that the chemometric code was different for each site. Multiple lots were made available from eight different metformin producers. Aliquots of samples were used for NIR, NMR, and LC-MS analyses.

Analytical detection
NIR, NMR, and LC-MS were selected in this study to provide information not only on the molecular structure of metformin hydrochloride but also on its impurity profile.

NIR spectroscopy
NIR spectra were acquired using an Agilent Cary 660 spectrometer (Agilent Technologies Inc., Santa Clara, California, United States) equipped with a NIR integrating sphere (PIKE NIR INTEGRATIR ™ ) under the following analytical conditions: wavenumbers ranging from 4,000 cm −1 to 10,000 cm −1 , resolution 4.0 cm −1 , and 32 scans. The powder was transferred into a flat bottom NIR transparent glass vial and analysed at 20°C-25°C with no sample pre-treatment. Agilent Resolution Pro ® Software version 5.2.0 was used to check the instrument performance and to process the spectra acquired in the absorbance scan type (Figure 2). Data extracted were exported and arranged in a dataset (32 samples and 3,113 wavelengths) efore processing by chemometric analysis.

NMR spectroscopy
Dimethyl sulfoxide (DMSO-d6) at 99.9% deuteration degree with 0.03% (v/v) TMS (Cambridge Isotope Laboratories, CIL) was used. An amount of 10 ± 0.06 mg of the metformin hydrochloride active substance was dissolved in 3 mL of DMSO-d6 to obtain a solution of 0.02 M. DMSO-d6 was selected on the basis of Gadape and Parikh (2011). A rational consideration of the use of deuterium solvent was performed, considering also the solubility of metformin in D 2 O and DMSO-d6. Deuterated DMSO was intentionally used to obtain metformin hydrochloride and its impurities' overall spectra to check the correct assignment of all 1 H-NMR protons; if D 2 O is used, all exchangeable NH protons disappear from the spectrum. The sample solution was heated at about 35°C under stirring and then vortexed for 1 min. The solution (0.7 mL) was transferred to an NMR tube.
NMR experiments were carried out on a Bruker Avance NEO spectrometer (Bruker BioSpin Gmbh, Billerica, Massachusetts, United States) operating at a frequency of 600 MHz (14.1 T), equipped with SMART PROBE iProbe 5 mm with Z-gradient.
The spectra were processed manually using the data analysis software package Bruker TopSpin ® version 4.1.3, applying 0.3 Hz line broadening, 0 th and 1 st order correction phase, and automatic baseline correction by polynomial and chemical shift calibration to the DMSO-d6 signal at 2.50 ppm.
The spectra of the 32 samples were aligned using AssureNMR ® (Bruker Corporation, Billerica, Massachusetts, United States); starting from this processing, 1 H-and 13 C-bucketed tables were generated.
The 1 H-NMR table was set considering the spectrum from 8.00 ppm to 0.0 ppm. Specific regions of chemical shift with a bucketing width of 0.05 ppm represent the bucket. Signal regions of either DMSO-d6 (2.64-2.36 ppm) or H 2 O (3.43-3.30 ppm) were excluded from the table. The integration mode was set on the sum of the intensities; the scaling mode applies the scaling factor to compare the NMR spectra uniformly. The chosen option was scaled to the total integral of all buckets, which divides individual bucket intensities by the total spectral intensity (Bruker Corporation, 2019). To reduce noise, a smoothing of spectra was performed using the Savitzky-Golay filter of 10.0 Hz. The data matrix of the 1 H-NMR was composed of 160 variables.
The 13 C-NMR buckets were created, defining the region at 160-35 ppm and a bucketing width of 1 ppm. From the estimation of intensities, the region of DMSO-d6 was excluded (40.4-39.6 ppm). Any spectra manipulation was applied considering that the carbon spectrum of metformin is represented only by four signals. The data matrix of the 13 C-NMR was composed of 125 variables.

LC-MS analyses
All reagents and solvents were of LC-MS grade. An amount of 10 ± 0.5 mg of each sample was weighed and dissolved in a 10 mL mixture acetonitrile/water 1:1 (v/v) containing 0.1% (v/v) formic acid. Samples were vortexed until dissolution was visually observed and then filtered through Nylon 0.22-µm filters. Each sample was analysed on the day of preparation. Procedural blanks were prepared with the same solvents used for samples and run in parallel; specifically, blank samples were injected in triplicate at the beginning of the analytical session and after each sample run.
MS analyses were performed on a Fast LC Mod.1290 Infinity system coupled to a Q-TOF mass spectrometer detector Mod. G6520B (Agilent Technologies) equipped with a Dual ESI source working in the positive ion mode. Mass parameters were set as follows: the source's nitrogen temperature was 300°C, the drying gas flow rate was 10 L/min, the nebulizer was set at 40 psig, Vcap = 4,000 V, the fragmentor was 100 V, and the skimmer was 65 V. The MS acquisition range was 100-1,200 Da with a rate of 2 spectra/s. The system was calibrated with a mixture of reference masses at the beginning of each working day. The chromatographic analysis consisted of a 15 min linear gradient elution at a flow rate of FIGURE 2 NIR spectra of metformin hydrochloride API samples.

Frontiers in Analytical Science
frontiersin.org 04 0.4 mL/min from 100% of mobile phase A containing 0.1% (v/v) of formic acid in 95:5 (v/v) water/acetonitrile to 100% of mobile phase B containing 0.1% (v/v) of formic acid in 5:95 (v/v) water/ acetonitrile. The system was then returned to the initial conditions which were kept for 5 min. The chromatographic column (1.8 µm Zorbax Extend-C18, 2.1 × 50 mm) was thermostated at 35°C. The injection volume was 1 µL, and the autosampler was thermostated at 15°C.
Raw chromatographic and spectral data were extracted and processed using MassHunter Qualitative Analysis ® version B.07.00 software and MassHunter Profinder ® version 10.0 (Agilent Technologies). From each sample data file, raw data in the form of total ion chromatograms (TIC), i.e., total ion intensity vs. retention time (R.T.) from 0 to 15 min were extracted by MassHunter Qualitative Analysis ® . A shift of 3 m was observed in the R.T. due to the electronic characteristic of the instrument; therefore, data were aligned a posteriori.
A second set of data was obtained for each API sample in targeted mode by the extracted ion chromatograms (EIC) (intensity of the targeted ion vs. R.T.) of the known impurities reported in the EP Metformin Monograph (European Pharmacopoeia, 2022) and of other possible molecules. The extracted chromatogram of each calculated m/z values [(M + H) + ions] was evaluated with a "yes/ no" approach. Peak presence (signal/noise >3) was encoded by 1 and peak absence by 0.
A third processing approach in "untargeted mode" was tentatively assessed. MassHunter Profinder ® software (Agilent Technologies, 2017) was used for molecular feature extraction (MFE), followed by retention time and mass alignment across the sample dataset. MFE aims to remove chemical background and rapidly find feature peaks in total ion chromatograms by taking isotope distribution into account (Benito et al., 2018). Features were extracted with an algorithm (polynomial interpolation) for common organic molecules with the following filters: m/z range (100-1,200 m/z), peak height (>50 counts), ion species (protonated ion, sodium adduct, potassium adduct, and neutral loss of water), charge state (set to a maximum of 2), maximum exact mass (<1,000 Da), peak spacing tolerance (0.0025 m/z, plus 7 ppm), MFE score (70%), R.T. alignment (0.1%, plus 0.3 min), and mass alignment (5 ppm, plus 2 mDa). Finally, all the features were checked to remove those containing atypical peak shapes or unusual isotopic distributions. Of the extracted features, only those not present in all samples were regarded as possibly discriminating and considered. Peak presence was encoded by 1 and peak absence by 0. A list of 104 grouped multiple peak entities (named extracted compound chromatograms or ECCs), defined by their mass-to-charge ion ratios, retention time, and peak intensity, was created and then exported to an Excel datasheet for further analysis.

Chemometric methods
Analytical data were collected as numeric data for NIR, 1 H-NMR, 13 C-NMR, and for LC-MS Q-TOF (TIC, EIC, and ECC data). Each dataset, obtained as previously described, was analysed individually by a multivariate unsupervised method, performing a principal component analysis (PCA). The low-and mid-level fusions were performed, combining two or three techniques and carrying out a PCA on these new datasets.
The Statistics and Machine Learning Toolbox (The MathWorks, Natick, MA, United States) and PCA_toolbox for MATLAB-version 1.4 (Milano Chemometrics and QSAR Research Group) (Ballabio, 2015) were used with MATLAB R2022b ® software (The MathWorks, Natick, MA, United States).

NIR spectroscopy
Several pre-processing methods were applied, before the application of PCA, in this order: multiplicative scatter correction (MSC), first and second derivative, standard normal variate (SNV), smooth processing, mean-centering, autoscaling, and a combination of these methods.
From the evaluation of different pre-treatments, the combination of SNV, first derivative, and mean-centering was used for the PCA.
The cross-validation PCA models demonstrated that PC1 explained 92.38% of variability. The score plot (Figure 3) of the NIR data showed the trend as clustering in four groups.
The largest group with a positive PC1 and PC2 is mostly represented by samples from manufacturers N and G with several contaminations: two samples from manufacturer L (L2 and L3), B2, A1 and H1, while a second group composed of more samples is placed at negative PC1 and positive PC2. Specifically, the three samples M (Norwegian manufacturer) occur in this last group. The samples P1 and P2 (French manufacturer) constitute a small group together with sample I1. Finally, samples E1 and E2 account for the fourth individual group, with the sample E3 slightly distant.
The T 2 value was calculated on the complete dataset (32 samples) to evaluate whether N2 and L1 could be regarded Score plot of PCA with two principal components using the NIR dataset.
Frontiers in Analytical Science frontiersin.org 05 as outliers. The T 2 value of N2 and L1 confirms the higher distance with respect to the two principal groups (9. 8434 and 12.4503, respectively). Moreover, considering independently the datasets of manufacturers N and G, the T 2 value > 1 of N3 and G3 confirmed the behaviour as outliers of these specific manufacturer groups (Table 2). Also in this specific evaluation, the N2 sample is confirmed as an outlier for manufacturer N. The PCA without the four outliers confirms the separation into the four groups described previously. It was not possible to define a clear wavelength area of the spectrum that would result in this grouping. Figure 4 shows the monodimensional 1 H-NMR overlapped and aligned spectra of the 32 samples of metformin hydrochloride. The singlet of the two equivalent methyl groups at 2.92 was observed. The proton signal of the single -NH group is present at 7.20 ppm; the proton of the two = NH groups and the protons of the -NH 2 group are assigned to 6.64 ppm ( Gadape and Parikh, 2011).
The data in the bucketed table were pre-treated before PCA using the autoscaling approach.
The PCA model with two PCs explained 77.3% of the variability. Figure 5A shows the PCA of the 1 H-NMR dataset. An overall separation into two groups representing 93.7% of the sample population is evident. The remaining samples (n = 2, 6.3%) are distant from the two groups.
Sample F1 is the only one from manufacturer F, originating from Spain. Adjacent to F1 is the sample L3, which is farther from L1 and L2. For the 13 C-NMR, the PCA was performed using mean-centring pre-treatment and three components (68.0% explained variability). All samples were clustered together in one cloud, although grouping tendency was represented as well as the samples of manufacturer M (M1-M3) ( Figure 5B).

LC-MS Q-TOF spectrometry
The evaluated data on LC-MS encompassed TIC data (total ion intensity vs. R.T.), EIC (extracted ion chromatograms), and ECC (extracted compound chromatograms). Only the EIC can be considered a targeted approach (Verzele et al., 2007), while both total ion intensity and molecular feature extraction (ECC data) are seen overall as untargeted analyses (Erny et al., 2016;Martínez-Bueno et al., 2019;Erny and Santos, 2021;Xue et al., 2022). Figure 6 shows an example of ECC of an unknown molecule in metformin samples with average mass = 326.0007 Da and R.T. at 7.3 min.
The PCA ( Figure 7B) shows a separation of the samples into groups (explained variance 82.86%). The most abundant group influenced by PC1 is represented by all samples from producer M (M1-3) and E (E1-3), along with B (B1, B2), D, and a single contamination by A. Samples G and N are not clearly explained by different clusters and are fairly scattered along the plot. Sample H stands for an individual point, well separated from all the others. All L samples are grouped together. The loading more associated with this group is represented by the m/z signals at 155.1040 (impurity C of the Eur. Ph. monograph).
As with EIC, the ECC dataset (104 variables) was not pretreated, and two principal components explained 81.6% of variability. All samples were grouped together in one cloud without an evident clustering, although grouping tendency was observed in samples from manufacturer E (E1-3) ( Figure 7C).

Fusion of the analytical datasets
Fusion was performed by sequentially combining two or three analytical approaches among NIR, 1 H-NMR, 13 C-NMR, TIC, EIC, and ECC. The PCA obtained by low-level fusion did not evidence new clusters with respect to the PCA carried out with a single technique (data not shown). The scenario changes using the mid-level fusion. The combination of two techniques improved the results of the LC-MS Q-TOF and 13 C-NMR techniques. A good separation was reported by combining the NIR and EIC databases. The separation of samples M and E is evident in the PCA. Moreover, a significant combination was reported by 13 C-NMR and EIC data that allow the identification of a group with the samples B, E, and M and another group containing the three samples L-confirming the results observed by EIC. Finally, the combinations with 1 H-NMR did not show results different than the use of a single technique, represented by the separation of groups N and G. The combination of TIC and 1 H-NMR data was performed, but no significant improvement on clustering was observed (data not shown).
The mid-level fusion combining three analytical techniques demonstrated that the NIR-13 C-NMR-EIC fusion led the best separation: the samples L were grouped together, and the samples B, E, and M formed another group. The sample of manufacturer H (H1) showed characteristic behaviour; this sample is isolated from the other groups (Figure 8).

Discussion
This manuscript investigated the characterisation of the metformin drug substances of different manufacturers with NIR, 1 H-NMR, 13 C-NMR, and high-resolution LC-MS combined with multivariate analysis to determine a possible classification for each manufacturer in view of prospective authenticity studies.
The PCA presented in this study was able to separate different batches of metformin from the same manufacturer. Specifically, M samples are generally close in a cluster, as shown by the PCAs of NIR, 1 H-NMR, 13 C-NMR, and LC-MS EIC data, and E samples are mainly grouped in LC-MS (in targeted EIC and in untargeted ECC data) and partially in NIR and 13 C-NMR.
Among all the investigated analytical techniques, NIR and 1 H-NMR data provide the most suitable separation in groups of the samples. In the NIR data, the largest cluster is represented by the G and N samples. The 1 H-NMR represents the only technique that distinguished samples N from samples G and associated specific chemical shift regions at these clusters. Both NIR and 1 H-NMR showed the proximity of samples P (P1 and P2) in the plots. A more dispersed behaviour was observed for other manufacturers- L1 and N2 in NIR;13  PCA on mid-level data fusion was able to separate in a cluster the B, E, and M samples and another in the L samples. The single H sample is well-separated from all other samples. 13 C-NMR and NMR data processing with bucket tables in combination with multivariate analysis was applied. The PCA of 13 C-NMR indicated a tendency of some samples to gather, although a clear separation of clusters could not be defined. The combination of 13 C-NMR and chemometric methods has been mainly used to obtain a fingerprint in metabolomic studies, whereas the pharmaceutical studies are limited (Silvestre et al., 2009;Ohmenhaeuser et al., 2013;Erich et al., 2015;Lia et al., 2020).
The bucket tables allowed the characterization of batches by assessing the normalised intensity differences in specific chemical shift regions. The bucketing method could be used to reduce the minor NMR peak misalignment influence due to different pH, salt, and even temperature issues (Emwas et al., 2018;Wang et al., 2020).
To the best of our knowledge, this is the first time a triplex approach to elaborate LC-MS data for chemometric analysis has been followed. The chromatographic signals TIC and ECC were considered untargeted approaches, while EIC, calculated on known impurities, was considered a targeted approach. The results obtained

FIGURE 6
Example of the extracted compound chromatograms (ECCs) obtained by MassHunter Profinder ® for six samples for an unknown compound found by the software application at R.T. = 7.3 min with average mass = 326.0007 Da.
Frontiers in Analytical Science frontiersin.org 08 are evidence that raw data processing is not trivial and can disclose different grouping tendencies. It should be noted that, in the present case, the targeted approach performed a separation into groups more clearly than those of the untargeted approaches. Moreover, for untargeted ECC data processing, the results could be influenced by the filtering levels (cut-off on peak height, peak spacing tolerance, MFE score, R.T. alignment, and mass alignment), so a more indepth study is needed to clarify the contribution of each filter parameter to the results.
In conclusion, most of the metformin manufactures selected for this study were characterised. The results specifically showed that M samples are clusterized by NIR, 1  The results obtained in this study highlight the capability of an integrated analytical platform combined with chemometric analysis to make a positive contribution to authenticity studies on drug substances. Different manufacturing processes have been linked to different groups obtained by PCA and correlated with the origin of drugs (Deconinck et al., 2008). Structurally complex drugs manufactured by multiple possible synthetic pathways, multi-step synthetic processes, and with many known and unknown impurities are more prone to exhibiting differences in spectroscopic and spectrometric data and in chemometrics models (Remaud et al., 2013). Metformin is a relatively simple molecule manufactured by a facile synthetic route encompassing only a single-step reaction of dicyandiamide and dimethylamine with a relatively well-established impurity pattern (Shalmashi, 2008;Yendapally et al., 2020). In addition, the limited number of samples per manufacturer may also explain the observed trend since the influence and impact of batch-to-batch variability cannot be completely excluded, especially if it results in small differences such as low-intensity signals that potentially characterize chemometric separation.
These results underline the need to address the potential effects of the limited variability of the manufacturing process and the consequent low probability of the presence of multiple unknown impurities. Nonetheless, different techniques or their fusion allow the clusterization of some metformin API samples.
The authors' previous results obtained for ibuprofen and carvedilol drug substances (Raimondo et al., 2020;Raimondo et al., 2022) evidenced a chemometric separation in PCA and cluster analyses based on API origin (EU and non-EU) related to specific signals in 1 H-NMR and in LC or LC-MS Q-TOF. This paper reports the results obtained with more techniques and different data analysis approaches. For metformin API, a separation based on EU or non-EU origin was not found; however, a clusterization for some manufacturers was observed. The comparison of results between these APIs highlights that the separation seems to strictly depend on the manufacturing process, which is in line with previous studies (Deconinck et al., 2008). In absence of complex multi-step processes and of many known impurities, a separation based on manufacturers or on origin (geographical area) was not clearly obtained. Nevertheless, as

FIGURE 8
Score plot of PCA with two principal components using the NIR-13 C-NMR-EIC fusion dataset.
Frontiers in Analytical Science frontiersin.org observed for metformin, most of the samples form clusters in one or more techniques or in their fusion. On the other hand, as reported in Li et al. (2020), no single technique can provide complete profiling. As a general strategy, we believe that a multi-technique approach and the knowledge of the manufacturing process are important pre-requisites to analysis. This study is part of the efforts of the European Official Medicines Control Laboratories to develop methods to identify possible falsifications of the origin/manufacturer of active drug substances. The application of chemometrics to the study of the fingerprinting of active drug substances is being increasingly developed, and the detection of more discriminant and predictive analytical techniques could depend on the specific drug substance and its manufacturing processes (Deconinck et al., 2022;Rebiere et al., 2022).
Current challenges are aimed at discriminating among different manufacturers of active substances to detect changes in manufacturing processes and cases of pharmaceutical falsification.

Data availability statement
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

Author contributions
MR: formulation and evolution of research goals and aims; application of the chemometric approach to analyse study data; performing the NMR experiments and data processing; and preparation, creation, and presentation of the published work. FP: formulation and development of research goals and aims; performing the NIR experiments and data collection; and preparation, creation, and presentation of the published work. FA: formulation and evolution of research goals and aims; performing the LC-MS experiments and data processing; and preparation, creation, and presentation of the published work. GD: supporting NIR experiments; chemometric analysis; and preparation, creation, and presentation of the published work. MG: formulation and development of research goals and aims; performing the LC-MS experiments and data processing; preparation, creation, and presentation of the published work; and oversight and leadership responsibility for research activity planning and execution.