Exploring the use of NIR and Raman spectroscopy for the prediction of quality traits in PDO cheeses

The aims of this proof of principle study were to compare two different chemometric approaches using a Bayesian method, Partial Least Square (PLS) and PLS-discriminant analysis (DA), for the prediction of the chemical composition and texture properties of the Grana Padano (GP) and Parmigiano Reggiano (PR) PDO cheeses by using NIR and Raman spectra and quantify their ability to distinguish between the two PDO and among their ripening periods. For each dairy chain consortium, 9 cheese samples from 3 dairy industries were collected for a total of 18 cheese samples. Three seasoning times were chosen for each dairy industry: 12, 20, and 36 months for GP and 12, 24, and 36 months for PR. A portable NIR instrument (spectral range: 950–1,650 nm) was used on 3 selected spots on the paste of each cheese sample, for a total of 54 spectra collected. An Alpha300 R confocal Raman microscope was used to collect 10 individual spectra for each cheese sample in each spot for a total of 540 Raman spectra collected. After the detection of eventual outliers, the spectra were also concatenated together (NIR + Raman). All the cheese samples were assessed in terms of chemical composition and texture properties following the official reference methods. A Bayesian approach and PLS-DA were applied to the NIR, Raman, and fused spectra to predict the PDO type and seasoning time. The PLS-DA reached the best performances, with 100% correctly identified PDO type using Raman only. The fusion of the data improved the results in 60% of the cases with the Bayesian and of 40% with the PLS-DA approach. A Bayesian approach and a PLS procedure were applied to the NIR, Raman, and fused spectra to predict the chemical composition of the cheese samples and their texture properties. In this case, the best performance in validation was reached with the Bayesian method on Raman spectra for fat (R2VAL = 0.74). The fusion of the data was not always helpful in improving the prediction accuracy. Given the limitations associated with our sample set, future studies will expand the sample size and incorporate diverse PDO cheeses.


Introduction
European PDO (Protected Designation of Origin) cheeses are outstanding examples of historic inheritance, which is characterized by diversity and tradition in the world of dairy production.These cheeses, which include renowned types like Parmigiano Reggiano (PR), Grana Padano (GP), Roquefort, and Manchego, are praised for their flavors that are closely connected to their geographic origins.Behind the artisanal craftsmanship and centuries-old traditions of PDO cheese production lies a complex interplay of chemical and physical processes.Spectroscopic techniques have emerged as indispensable tools in the scientific field, providing invaluable insights into the composition, structure, and quality of these dairy products (1).Among these techniques, Near Infrared (NIR) and Raman spectroscopy are the most used in the food industry for the quantification of crucial cheese components such as moisture, fat, and protein content (2)(3)(4), facilitating precise quality control, and for the identification and quantification of specific compounds responsible for flavor and aroma (5).The top selling PDO cheeses in the world are the Italian PR and GP, with total export value exceeding $1.2 billion and $650 million USD, respectively (6).Despite their reputation in the global market, monitoring their quality is a complex task involving several challenges.Spectroscopic techniques, while valuable, face specific issues when applied to these cheeses, in part because obtaining representative samples from the cheese wheels is not always easy and because developing accurate calibration models for spectroscopic analysis requires extensive data collection and validation (7).Moreover, as both cheeses are vulnerable to counterfeiting and fraudulent replication, spectroscopic techniques used for authentication must be robust to detect subtle differences between authentic and imitation products.
When spectroscopy is used, the instrument is also an important factor, and its choice depends, apart from available resources, on the specific goals of the analysis, the cheese type, and the desired level of detail in terms of composition prediction.For example, NIR is wellsuited for macronutrients, whereas Raman is more suitable for lipids, such as ester linkages in triglycerides and phosphodiester bonds in phospholipids.The presence of these functional groups results in unique Raman peaks, enabling the differentiation of the type of lipid (8)(9)(10).For these reasons, a combination of spectroscopic techniques has been proposed to improve the accuracy of composition trait predictions from cheese spectra (11)(12)(13).The performance of the models can be evaluated in different ways: use of latent variables, cumulative variance, standard error of calibration, standard error of cross validation, coefficient of determination, similarity map and salience dimension of common space, limit of detection, limit of quantification, linearity, model fit and uncertainties.The spectroscopic instrument and the chemometric approach used (and related setup of parameters and features) for predicting cheese composition or other traits of interest can impact the prediction accuracy.For example, while chemometric methods like Partial Least Squares Regression (PLSR), Principal Component Analysis (PCA), or Linear Regression models are more commonly used for the prediction of food (including cheese) composition and other traits (14), Bayesian approaches are less common but have gained attention in recent years due to their ability to handle complex data structures and to provide deeper insights into uncertainty and variable selection (15).However, the choice of chemometric method often depends on several factors, including also the expertise of the researchers.
Therefore, the aims of this study were to compare two different chemometric approaches using a Bayesian method and PLSR and PLS-PLS-Discriminant Analysis (PLS-DA) (i) for the prediction of the chemical composition and texture properties of the GP and PR PDO cheeses by using NIR and Raman spectra, (ii) and for their ability to distinguish between the two PDO and among their ripening periods.

Experimental design
A total of 18 cheese samples were collected from 6 dairy plants comprised within the consortia of GP and PR dairy chain.Three dairy plants belonged to GP and three to PR PDO chains.For each dairy, three seasoning times were selected, and were 12, 20, and 36 months for GP and 12, 24, and 36 months for PR.

Collection of the spectra
A portable NIR instrument (Alba GraiNit, Padova, Italy), working within the spectral range 950-1,650 nm, was used on 3 selected spots of each cheese samplealong the radius of the cheese wheel (as indicated in Figure 1), for a total of 54 spectra collected.The spectra (average of 27 spectra within PDO type) are plotted in Figure 2A for Grana Padano and (Figure 2B) for Parmigiano Reggiano.
An Alpha300 R confocal Raman microscope (WITec, Ulm, Germany) was used to acquire the Raman spectra using a 532 nm laser at 40 mW and a 10x/0.25 objective.The integration time was 1 s and the number of accumulations 20.In this case, 10 individual spectra were collected for each cheese sample in each location (3 spots) for a total of 540 Raman spectra collected.The spectra (average of 270 spectra within PDO type) are plotted in Figure 3A for Grana Padano and (Figure 3B) for Parmigiano Reggiano.

Composition analyses
The chemical composition (moisture, protein, lipids) was analyzed on all the cheese samples.Briefly, the cheese samples were grinded (Thermomix TM6, Vorwerk), and approximately 2 g  .The protein content of cheese samples was determined by Kjeldahl method (17).The steps typically involve digestion of the sample with concentrated sulfuric acid, conversion of nitrogen to ammonium sulfate, and subsequent measurement of the ammonia produced.The protein content is then determined using the total nitrogen content.

Texture properties
Texture traits of all the cheese samples were determined using a Texture Analyzer (XT2i, Stable Micro Systems, Ltd., Godalming, Surrey, UK) with a Warner-Bratzler shear device [50 Newton (N) load cell; 2 mm/s crosshead speed].For each cheese, 1 cylindershaped core sample was taken (1 cm 2 cross sectional area; 3 cm long).Texture data were reported as hardness (defined as the maximum shear force, expressed in N), adhesiveness (describes the work needed to overcome attractive force between food and other surfaces, expressed in N/s), resilience (which referred to the degree to which the cheese regains its original shape during the biting process, expressed in %), cohesiveness (the tendency of cheese to remain together, and resist breaking into several pieces, during compression), springiness (a measure of ability that the deformed cheese returned to the initial position after the removal of the force, expressed in %), gumminess (the energy required to disintegrate a semi-solid food to a state ready for swallowing, expressed in N), chewiness (the work needed to masticate a solid food to a state ready for swallowing, expressed in N/s), respectively.

Editing of the spectra, data fusion and Bayesian models
Before data fusion and spectra analysis, each instrument's raw absorbance values of each wavelength of the spectra were centered and scaled to a null mean and a unit variance.Then, samples having a large spectral distance (i.e., Mahalanobis distance >3) were considered Absorbance spectra (the solid lines represent the average absorbance and the broken lines the mean ± 1 SD) of NIR portable instrument for Grana Padano (GP) (A); Parmigiano Reggiano (PR) (B).outliers and removed from the calibration dataset.After scaling, the NIR and Raman spectra were concatenated and stored as single matrix to be used for the spectra analysis.This is called low-level data fusion (18,19).No other mathematical preprocessing was applied to the spectra.The matrix comprises m-rows (number of individual samples) and n columns (measurement variables from each source).In this study, fusion data comprised a total of 2,000 variables with the Raman and NIR instrument contributing for 1,600 and 400 wavelengths, respectively.
The calibration models built using the Bayesian approach (15) were developed by using the Bayesian Generalized Linear Regression [BGLR; (20)] package, available in R software (21).Each trait was regressed to a new pool of wavelengths using the following equation: Where μ is the overall mean, x ij are the NIR, Raman or fused spectra's x axis, i is the sample (from 1 to 54), t is the number of wavelengths of the studied spectra (400 for NIR and 1,600 for Raman), and j the wavelengths (950 to 1,650 nm for NIR, and from −50 to 4,000 rel cm −1 for Raman), β j are the regression coefficients, and e i is the residual assumed to be independently and identically distributed with a normal distribution with mean equal to 0 and variance equal to s 2 e .The Bayes B model implemented in the package we used incorporates prior information about the model parameters and updates this information based on the observed data to estimate the effects of wavelengths on the phenotypes (15).

Editing of the spectra and PLSR models
PLSR models were performed with the software Unscrambler (Aspen Tech, MA, USA).For the classification of the seasoning and PDO type of each cheese sample, PLS-DA models have been performed by Solo, version 9.2.1 (Eigenvector Research, Inc., Wenatchee, WA, USA).The different parameters were obtained from the different reference methods and were used as reference values for the model.
For the PLSR and PLS-DA model construction, from the total 54 spectra, 36 were selected by the Kennard-Stone (21) algorithm and used as calibration samples and the following 18 were kept as validation set.By doing this, the final model will have the maximum spectral variability to reach the best prediction capability.
The spectral pretreatments used to construct the PLSR and PLS-DA models included standard normal variate (SNV) (22), used to correct baseline shifts and variations in intensity across spectra; Savitzky-Golay Derivatives with second-order polynomial fitting (1st D and 2nd D) (23), to reduce highfrequency noise in a signal due to its smoothing properties and to reduce low-frequency signal (e.g., due to offsets and slopes) using differentiation and the second-order derivatives to highlight spectral features; smoothing (moving average, MA) (24); linear baseline correction (BLC) (25) and orthogonal signal correction (OSC), used to remove unwanted variations in the X-data that are unrelated to interest response (Y) (26).The selected pretreatments allowed the reduction of multiplicative effects derived from the physical characteristics of the samples and allowed the enhancement of the differences between spectra that will allow the performance of the desired classifications and quantitative models.The MA and BLC were necessary only for Raman spectra, as MA was useful to reduce the noise, whereas BLC corrected the curvature caused by the fluorescence effect.

Cross-validation models 2.5.3.1 Bayesian approach
For the classification of the seasoning and PDO type of each cheese sample and the prediction of the chemical composition and texture traits, a random cross-validation was applied, in which 80% of the total records were randomly selected and used to build the equation (calibration set; CAL), and the remaining 20% of records were used to test the model (validation set; VAL).To account for sample variability, the procedure was repeated 10 times for the classification and 5 times for the quantification.The results were averaged over the replicates.The standard deviation (SD) across the replicates was also calculated.The coefficient of determination (R 2 VAL ), the root mean squared error of validation (RMSE VAL ) and relative error of prediction (RSEP%) were used to assess the models' performances.As the Bayesian approach uses a linear regression model, it was necessary to establish a decision criterion for interpreting the predicted values.For the prediction of the PDO, the label were switched to numerical value (0 for GP and 1 for PR) and a critical threshold was set up at 0.5.All predicted value <0.5 are attributed to GP while values >0.5 are classified as PR.For the prediction of the seasoning time, the model used the numerical value in months and the predicted values were divided in 3 classes: "young" (<17 months), "mid" (17-28.5 months) and "old" (>28.5 months).The ranges were decided in order to distinguish the 3 seasoning times and allowing the comparison between PDO (as their seasoning time was not the same).The percentage of attribution of each class was also calculated to assess the model's accuracy.

PLSR and PLS-DA approaches, and data fusion
Previously to any calibration or classification model an exploratory analysis was done to find the combination of pretreatments that provided better discrimination between PDO and ripening times classes.
For each instrument, prior to data fusion and spectra analysis, the raw absorbance values of each wavelength of the spectra were centered and scaled to a null mean and a unit variance.Then, samples having a large spectral distance (i.e., Mahalanobis distance >3) were considered outliers and removed from the calibration dataset.After scaling, the NIR and Raman spectra were concatenated and stored as a single spectra matrix that could be used for the spectra analysis.The matrix comprises m-rows (number of individual samples) and n columns (measurement variables from each source).Fusion data comprised a total of 2,000 variables, with the Raman and NIR instruments contributing for 1,600 and 400 wavelengths, respectively.Calibration models were constructed using the PLSR algorithm and internally validated by cross-validation (the leaveone-out method for PLSR and venetian blinds for PLS-DA).The coefficient of determination (R 2 VAL ), the root mean squared error ( ) ( ) Where y i is the reference value for validation set sample i, ŷ i is the predicted value for validation set sample i, and n is the number of samples in the validation set.
The optimum number of latent variables was determined from a plot of the explained variance against the number of factors.Then, the initial model was refined by selecting those factors resulting in the lowest relative standard error for prediction (RSEP%), bias, and SD for the validation set.
For the classification, the percentage of attribution of each class was also calculated to assess the model's accuracy.
Then, a permutation tests were conducted on the Bayesian and PLS models to assess their predictive performance and rule out potential overfitting to the spectral data's inherent structure.The tests involved randomly shuffling the class labels (PDO and age) 500 times and refitting the models to each shuffled dataset.From this test we constructed a null distribution representing the range of outcomes expected under the assumption of no true relationships between the spectral data and the class labels.A key indicator of a model's genuine predictive ability is its performance relative to this null distribution.If only a small percentage of the permuted datasets yields results comparable to or better than those of the original model, it suggests that the original model's performance is unlikely to be attributable to chance associations or overfitting.Instead, it provides evidence that the model has successfully captured meaningful patterns in the spectral data that are genuinely predictive of the class labels (27).Since only 2% of the permutations yielded results comparable to or better than those of the original Bayesian models for all spectral matrices, and none of the permutations achieved results similar to the developed PLS model (data not shown), the results will not be further discussed, as the model actually predicted the class label (PDO or age) from the spectral data, without overfitting to a casual structure of the data.

Results and discussion
3.1 Prediction of seasoning and PDO type: Bayesian vs. PLS-DA across different type of spectra Table 1 reports the descriptive statistics for composition traits and texture properties of the two types of PDO cheeses: PR and GP PDO cheeses.As regards to the chemical composition, the PR had a slightly higher moisture content (27.5%) compared to GP (26.7%), with similar CV (9 and 8%, respectively, for PR and GP).Moreover, PR contained more fat (33.7%) and less protein (34.3%) contents, compared to GP (30.1 and 32.4%, respectively, for fat and protein).Regarding texture properties, the hardness of both cheeses was similar, with PR having a mean hardness of 18.38 N and GP of 18.33 N. The CV values were relatively high, indicating some variability within each cheese type (15 and 19%, respectively) probably due to the wide range of seasoning of the analyzed samples.Adhesiveness refers to the ability of a cheese sample to stick or adhere to surfaces.Parmigiano Reggiano showed higher adhesiveness (−0.76 N/s) compared to GP (−0.47 N/s).Both had negative values, indicating that the cheese sample exhibited a low tendency to stick or adhere to surfaces.This can be a desirable characteristic for some types of cheese, like the hard ones.The CV was high (30 and 66%, respectively), which could be due to the different ripening times within each PDO cheese.Both cheeses exhibited similar resilience values (9.30 and 9.20%, respectively, for PR and GP) with higher CV for GP (9 and 14%, respectively).Cohesiveness quantifies the degree to which a cheese sample resists falling apart or fragmenting when it is bitten, chewed, or compressed.It provides insights into the cheese's ability to maintain its integrity and internal structure during consumption, and from this point of view the two PDO had similar values (0.27 and 0.26, respectively, for PR and GP).As regards to springiness, this was higher in GP (54.8%) compared to PR (47.3%), with also higher CV (27 and 20%, respectively).Both cheeses had similar gumminess values (4.98 N and 4.87 N, respectively, for PR and GP), with higher CV for GP (23 and 28%, respectively).Grana Padano was also slightly chewier (2.82 N/s) than PR (2.45 N/s), with also higher CV (51 and 42%, respectively).

Prediction of seasoning and PDO type: Bayesian vs. PLS-DA across different type of spectra
Tables 2, 3 show the numbers of correctly and wrongly identified samples for the seasoning and PDO type by using the Bayes B model (Table 2) and the PLS-DA model (Table 3) with NIR, Raman and fused (NIR + Raman) spectra, respectively.Regarding the seasoning and PDO identification (Table 2), the NIR technique performed relatively well with accuracy ranging from 75% (Young) to 90% (Mid) and with a correct % of identification of 69% for GP and 59% for PR.The Raman performed poorly on the seasoning time, with a correct identification of 12 and 17%, respectively, for young and old ripening times, but achieved better results for mid seasoning (83%).The exact reasons for the observed performances can be multifaceted.However, some potential factors contributing to the poor results for young and old ripening times, when using a Bayesian approach, could include the dataset size, as in a small dataset, the observed information may not be comprehensive enough to fully capture the complexity of the relationship between Raman spectra and the ripening times of cheese.For the PDO identification, Raman achieved 83% accuracy for GP and 78% for PR.Regarding the results from the data fusion, this is the first work the authors are aware of, presenting results about data fusion on the prediction of PDO type and seasoning time.The spectra fusion provided consistent results with accuracies from 64 to 77% for the seasoning time, and high accuracy for both GP and PR (89%).As regards to PLS-DA used for the seasoning identification (Table 3), NIR achieved relatively high accuracy, ranging from 66 to 88%.For the PDO identification, NIR achieved perfect accuracy (100%) for GP and high accuracy (88%) for PR.Raman exhibited moderate accuracy, ranging from 50 to 75% for the seasoning time, but it achieved 100% accuracy for both PDO cheeses.The fusion of the spectra provided good accuracy, from 66 to 83% for the identification of the seasoning time, and was excellent for GP (100%) and good (77%) for PR.Comparing the two chemometric approaches, the Bayes B Model achieved very good accuracy for some categories within seasoning and PDO type but overall, the PLS-DA models consistently outperformed the Bayes B in terms of accuracy across all the instruments (NIR, Raman, and fused spectra).In particular, PDO identification using the PLS-DA models showed very high accuracy, with GP being correctly identified at 100% in all three type of spectra.As reported by Silva et al. (28), the chemometric methods most commonly used for the differentiation between cheese samples by origin, were mainly linear discriminant analysis, principal component analysis and the PLS-DA methods.For example, Karoui et  In the case of data fusion, comparing the results from the integration of the spectra from NIR and Raman and the individual techniques, it comes that, when using the Bayesian model, the fusion of the data increased the performances in the prediction of the early seasoning (<12 months) and the identification of the PDO type.For the mid and old seasoning, the data fusion did not improve (mid seasoning, 77% vs. 90 and 83%, respectively, for fused vs. NIR and Raman) or improved only partially (old seasoning, 17% for Raman) the correct % of identification (Table 2).When using PLS-DA, the data fusion had the best prediction accuracy for the mid and old seasoning, improved the prediction over Raman for the early seasoning, did not change for the GP, and reduced the correct % of identification for PR (Table 3).It is interesting to note that in some cases, the fusion was not helpful to the improvement of the prediction accuracy, and this could be due to the fact that NIR and Raman may capture similar information about the sample that can be redundant, which in turn does not provide additional information (32).Other studies applied NIR and Raman spectra fusion for food authentication in other matrices: Márquez et al.
(33) tested two data fusion strategies (mid and high level) combined with a multivariate classification approach for the

Prediction of chemical composition and texture properties: Bayesian vs. PLSR for NIR
Tables 4, 5 report the prediction statistics of the composition and texture properties of cheese deriving from the Bayesian and PLSR procedures using NIR spectra from the cheese samples.As regards the Bayesian approach (Table 4), the R 2 CAL values for fat, protein, and moisture ranged from 0.11 (fat) to 0.50 (moisture), with quite high SD (from ±0.05 to ±0.13), suggesting substantial variability in the model's performance.Indeed, the errors of calibration were also high (from 1.57 to 2.91% for RMSE CAL and from 5.8 to 9.2% for RSEC%).Obviously, the results in the validation set were not accurate enough to be used at the dairy industry level, with R 2 VAL < 0.50 within composition traits and high RSEP% (from 8 to 9%).Results from the texture properties were even worse, especially if we consider that the R 2 CAL values were lower than the R 2  VAL values.This fact could be due to the randomness in the data split (35) and/or the small sample size and/or to overfitting due to the complexity of the model (36).It is worth mentioning that the effectiveness of a chosen approach (Bayesian of PLSR in this study) depends on the specific characteristics of the data and the nature of the trait to be predicted.In some cases, Bayesian modeling with informative priors can provide valuable insights even with small datasets, as well as PLSR may be a more practical and less computationally demanding option.Indeed, results from the PLSR procedure (Table 5) generally showed better results than the Bayesian approach.Overall, the R 2 CAL ranged from 0.37 (chewiness) to 0.89 (protein) with lower RSEC% (from 2.4 to 39%) compared to the Bayesian approach (RSEC% from 5.8 to 43%; Table 4).The R 2 VAL values were good for moisture (0.69) and fat (0.63), with generally higher values compared to the Bayesian model, although for some traits, the R 2 VAL was higher than the R 2 CAL (adhesiveness).Comparing the two chemometric approaches, the Bayes B model using NIR spectra appeared to have some limitations in predicting both the composition and texture properties of cheese.The model's performance varied across different traits, but with generally low R 2 values and relatively high errors.This suggests that the model may require further refinement or the inclusion of additional features to improve its predictive accuracy for these cheese properties.In contrast, the PLSR procedure using NIR spectra appeared to be more effective in predicting both the composition and texture traits of cheese.However, these results may, in part, be attributed to the inherent constraints associated with the small sample size, underscoring the importance of considering the potential impact of limited data in interpreting the predictive capabilities of the model.There is an extensive amount of literature on the application of PLSR and NIR spectroscopy to the prediction of cheese chemical composition.In diverse kinds of cheeses and curds, all predictions of fat and the majority of predictions of protein and moisture/dry matter can be deemed excellent [R 2 > 0.90; (37-39)].NIR spectroscopy is, therefore highly suitable for predicting main chemical components like fat, protein, and moisture, and it has also been proved to be effective for monitoring the compliance of nutritional labels with EU tolerance limits of a wide varety of food products (40).However, it is not stable for predicting chemicals with small amounts in cheese products [e.g., minerals, fatty acids; (41)] since the amount of chemicals can alter the accuracy of prediction.NIR has a great degree of spectrum stability, which has made it quite popular at the industry level (42).However, literature related to NIR spectroscopy for cheese quality evaluation has become less in recent years (43) because applications for evaluating other attributes are inadequate, and innovative chemometric methods are not extensively used in research.VAL values in all three type of data, suggesting that they might be challenging to predict using these spectroscopic/ chemometric techniques.In the scientific literature, the highest number of publications incorporating data fusion techniques is alcoholic beverages (27%) [e.g., (18)] followed by fruit and vegetables (17%) [e.g., (50)] and oils (13%) [e.g., (19)].Milk and dairy products cover the smallest percentage (only 5%) (49), therefore, it is difficult to compare our results with the current literature.However, it is important to mention that simply combining data from various instruments (in this case, NIR and Raman) does not automatically improve prediction accuracy.For example, data fusion may not be advantageous if the single instruments produce extremely comparable information or if one of them introduces excessive noise.It is critical to examine the uniqueness of the information provided by each instrument and carefully consider whether the benefits of fusion outweigh the obstacles and potential drawbacks, such as redundancy and noise amplification (51,52).This consideration becomes particularly pertinent in the context of a small sample size, where the potential benefits of data fusion must be carefully interpreted.
So, the most effective method was the PLS-DA approach applied to NIR for classifying seasoning time, and to Raman spectra for classifying PDO type.The NIR, coupled with the PLS technique, was the best one for chemical traits.With the exception of cohesiveness (Raman) and resilience (NIR/Raman) with a RSEP% lower than 10%.On the other hand, no chemometric strategy nor spectroscopic technique for predicting texture achieved good prediction levels to be employed in the dairy industry.

Conclusion
This proof of principle study provides new insights into the application of chemometric approaches for predicting the characteristics of GP and PR PDO cheeses.Specifically, it focused on the use of NIR and Raman spectra and their integration to achieve these predictions, as well as the potential to distinguish between the two PDO and their various ripening stages.As regards the classification models, the PLS-DA achieved the greatest results, correctly identifying the PDO type at 100%.The findings were enhanced by the data fusion in 60% of the cases using the Bayesian approach and 40% using the PLS-DA approach.As regards the prediction of chemical composition and texture traits, the Bayesian technique using Raman spectra for fat provided the greatest performance in validation.It is important to highlight that the accuracy of predictions did not consistently improve with data fusion.This suggests that the effectiveness of data fusion may vary depending on the specific analysis and the methods employed.It may be necessary to carefully consider when and how to apply data fusion in such studies.Mathematical spectra treatment before and after fusion may enhance prediction accuracy, especially when dealing with inherently distinct techniques.Moreover, larger dataset of high quality data is needed to improve the statistical power of the analysis.This can also help to validate the models and reduce the risk of overfitting.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers.Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

FIGURE 1
FIGURE 1Example of a cheese sample analyzed and related spots (1: paste near the crust; 2: middle sample paste; 3: core of the entire cheese wheel).
of grinded cheese sample was weighed and used for moisture and fat analysis using the microwave moisture analyzer (Smart 5 Turbo) and Nuclear Magnetic Resonance (NMR) fat analyzer (CEM Corporation).The microwave moisture analyzer applies microwave radiation to the sample, causing the water molecules to heat up and evaporate.The moisture content is then determined by measuring the weight loss of the sample before and after the microwave treatment.For the determination of fat, the NMR can distinguish the signal produced by hydrogen protons present in fats from that produced by all other sources of protons in food matrices, such as carbohydrates and proteins.The instrument is validated according to the AOAC regulations [Peer-Verified Method 1:2004 according to(16)]

FIGURE 3
FIGURE 3Absorbance spectra (the solid lines represent the average absorbance and the broken lines the mean ± 1 SD) of the Raman instrument for Grana Padano (GP) (A); Parmigiano Reggiano (PR) (B).

TABLE 2
Identification of seasoning and PDO using the Bayes B Model with NIR, Raman, and Fused Spectra.

TABLE 1
Descriptive statistics (Mean ± SD, and CV) of composition and texture properties of Parmigiano Reggiano and Grana Padano cheese samples.

TABLE 3
Identification of seasoning and PDO using the PLS-DA Models with NIR, Raman, and Fused Spectra.Prediction statistics of composition and texture properties of cheese deriving from the PLS regression procedure using NIR spectra from paste cheese samples.VAL, coefficient of determination for the calibration and the validation; RMSECAL and RMSEVAL, Root Mean Square error for the calibration and validation; RSEC% and RSEP%, Relative error for the calibration and the validation; Slope of the calibration equation; Prediction bias, average difference between the predictions and the labels in dataset in absolute value.

TABLE 4
Prediction statistics (Mean ± SD of the 5 replicates) of composition and texture properties of cheese deriving from the Bayesian model Cross-Validation procedure using NIR spectra from paste cheese samples.VAL, coefficient of determination for the calibration and the validation ± Standard Deviation; RMSECAL and RMSEVAL, Root Mean Square error for the calibration and validation ± Standard Deviation; RSEC% and RSEP%, Relative error for the calibration and the validation ± Standard Deviation; Slope of the calibration equation ± Standard Deviation; Prediction bias, average difference between the predictions and the labels in dataset in absolute value ± Standard Deviation.

TABLE 6
Prediction statistics (Mean ± SD of the 5 replicates) of composition and texture properties of cheese deriving from the Cross-Validation procedure using Raman spectra from paste cheese samples.VAL, coefficient of determination for the calibration and the validation ± Standard Deviation; RMSECAL and RMSEVAL, Root Mean Square error for the calibration and validation ± Standard Deviation; RSEC% and RSEP%, Relative error for the calibration and the validation ± Standard Deviation; Slope of the calibration equation ± Standard Deviation; Prediction bias, average difference between the predictions and the labels in dataset in absolute value ± Standard Deviation.

TABLE 7
Prediction statistics of composition and texture properties of cheese deriving from the PLS regression procedure using Raman spectra from paste cheese samples.VAL, coefficient of determination for the calibration and the validation; RMSECAL and RMSEVAL, Root Mean Square error for the calibration and validation; RSEC% and RSEP%, Relative error for the calibration and the validation; Slope of the calibration equation; Prediction bias, average difference between the predictions and the labels in dataset in absolute value.

TABLE 8
Prediction statistics (Mean ± SD of the 5 replicates) of composition and texture properties of cheese deriving from the Cross-Validation procedure using fused NIR and Raman spectra from paste cheese samples.VAL, coefficient of determination for the calibration and the validation ± Standard Deviation; RMSECAL and RMSEVAL, Root Mean Square error for the calibration and validation ± Standard Deviation; RSEC% and RSEP%, Relative error for the calibration and the validation ± Standard Deviation; Slope of the calibration equation ± Standard Deviation; Prediction bias, average difference between the predictions and the labels in dataset in absolute value ± Standard Deviation.

TABLE 9
Prediction statistics of composition and texture properties of cheese deriving from the PLS procedure using Fused Spectra from paste cheese samples.VAL, coefficient of determination for the calibration and the validation; RMSECAL and RMSEVAL, Root Mean Square error for the calibration and validation; RSEC% and RSEP%, Relative error for the calibration and the validation; Slope of the calibration equation; Prediction bias, average difference between the predictions and the labels in dataset in absolute value.VAL values falling between those of Raman and fused spectra.For some attributes (e.g., fat, protein, moisture), NIR spectra outperform fused spectra regarding R 2 and R 2