Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Plant Sci., 15 January 2026

Sec. Crop and Product Physiology

Volume 16 - 2025 | https://doi.org/10.3389/fpls.2025.1736546

Quantitative analysis of tobacco blending proportions based on hyperspectral imaging and data fusion

Yifan Jiang&#x;Yifan Jiang1†Qinlin Xiao,&#x;Qinlin Xiao1,2†Xudong HuangXudong Huang3Ruifang GuRuifang Gu2Jing WenJing Wen2Xixiang ZhangXixiang Zhang2Yang LiuYang Liu2Li LiLi Li2Xiaojing ChenXiaojing Chen4Juan Yang*Juan Yang2*Yong HeYong He1
  • 1College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China
  • 2Technology Center, China Tobacco Sichuan Industrial Co., Ltd., Chengdu, China
  • 3School of Opto-Electronic Engineering, Changchun University of Science and Technology, Changchun, China
  • 4College of Electrical and Electronic Engineering, Wenzhou University, Wenzhou, China

The rapid and accurate detection of tobacco blending proportions is essential for quality control in the tobacco industry. This study proposes a method for the quantitative analysis of tobacco components based on multispectral fusion, integrating visible-near-infrared (Vis-NIR) and near-infrared (NIR) spectral data. The method employs the minimum covariance determinant (MCD) for anomaly detection and constructs a quantitative model using partial least squares regression (PLSR). The experimental data comprise two matrices of dimensions 400 × 90 and 220 × 90, each containing 90 samples. Experimental results demonstrate that multispectral fusion significantly improves the model’s quantitative analysis performance compared to using a single spectrum. The adopted preprocessing strategy effectively reduces noise interference and enhances feature extraction capability. When predicting tobacco silk content, the fused spectral model achieved the highest prediction accuracy with R2 of 0.8873. The innovation of this study lies in the proposed multispectral data optimization fusion and preprocessing strategy, which facilitates rapid detection of tobacco constituents and offers an optimal and efficient method. This approach provides a reliable technical solution and advances spectral detection technology in the tobacco and related industries.

1 Introduction

As a globally significant cash crop, tobacco plays a pivotal role in agricultural economies, industrial manufacturing, and international commerce. Southwest China has established itself as the nation’s premier production base for premium flue-cured tobacco, due to its exceptional climatic conditions and ideal summer growing environment (Yao et al., 2024). Within this context, tobacco blend ratio testing represents a fundamental quality control measure for maintaining consistent sensory characteristics in cigarette products (Wu et al., 2022). Different tobacco varieties, growing regions, and quality grades demonstrate marked variations in aromatic profile, flavor intensity, and smoking characteristics (Niu et al., 2022). Critically, slight deviations in blending proportions can result in noticeable batch-to-batch inconsistencies - manifested as diminished fragrance, undesirable aftertastes, or increased irritation - all of which directly compromise the consumer experience. Maintaining precise blend ratios is essential for preserving the distinctive sensory signature of each cigarette, ensuring every unit faithfully reproduces the brand’s intended flavor architecture. Beyond quality assurance, optimized blending protocols enable efficient raw material utilization by preventing both the overuse of premium leaves and excessive incorporation of lower-grade tobaccos, while simultaneously guaranteeing compliance with both manufacturing specifications and regulatory standards. Through rigorous monitoring and control of blending uniformity, manufacturers can deliver product consistency at scale, meeting consumer expectations for sensory performance while strengthening brand equity in an increasingly competitive marketplace.

At present, the conventional methodologies employed in the tobacco industry for the determination of components primarily encompass chemical analysis by chromatographic techniques and solvent extraction, as well as physical detection by microscopic observation (Chen et al., 2021; Losso et al., 2022; Li et al., 2024a). While these approaches are indeed capable of determining critical components such as nicotine, sugar, and tar in tobacco, their limitations are equally pronounced. Chemical analysis typically necessitates complex pretreatment processes, which are both time-consuming and destructive (Losso et al., 2022). Physical methods, on the other hand, are susceptible to subjective interpretation and lack the requisite precision. Moreover, these methods struggle to support real-time monitoring, trace components, or complex mixtures of limited resolution. Additionally, the equipment costs and maintenance requirements are high, impeding production efficiency and the necessity for precise control. Therefore, hyperspectral imaging technology and similar emerging technology fields are rapidly becoming the focus of industrial research.

Hyperspectral imaging technology (HSI) demonstrates unique advantages in the field of quality inspection (Özdoğan et al., 2021; Ahmed et al., 2024; Medina–García et al., 2025). It combines spectral analysis with image recognition to simultaneously capture spectral and spatial distribution data of the target object, thereby achieving a “spectral-image integration” inspection effect (Zhao et al., 2023). Additionally, its ultra-high spectral resolution and rich spatial information enable it to more accurately identify and locate different components within a sample. In terms of detection performance, HSI offers fast, non-destructive, high-sensitivity detection with real-time monitoring capabilities. These characteristics make it highly valuable in applications such as agricultural product quality grading (Gao and Xie, 2024) and impurity identification (Jia et al., 2024; Nargesi et al., 2024; Shiny et al., 2024; Fei et al., 2025). Additionally, with the development of machine learning algorithms, the processing efficiency and analysis accuracy of hyperspectral data have been significantly improved, providing new technical pathways for quality detection in complex scenarios. Currently, this technology has been successfully applied in multiple fields such as food component visualization analysis (Shi et al., 2024; Seong et al., 2025) and drug uniformity assessment (Belay et al., 2021; Rocha de Oliveira and de Juan, 2021), with its detection effectiveness widely validated. Tian et al. evaluated the effect of drying processes on the anthocyanin and moisture content of purple sweet potatoes, using HSI to estimate and generate a visual map of the distribution patterns in processed purple sweet potatoes (Tian et al., 2021). Xie et al. developed a method using 750 nm/900 nm ratio imaging combined with machine vision to accurately quantify spice uniformity in cooked meat products (Xie et al., 2022a).

However, the adoption of spectral technologies in tobacco remains at a developing stage. As summarized in Table 1, current research indicates that NIR spectroscopy is the most widely used technique, offering reliable quantitative prediction of key constituents such as nicotine, sugars, and moisture in tobacco. Non-imaging spectroscopic methods dominate present applications, while hyperspectral imaging is still relatively underexplored in this specific domain, despite its potential for spatially resolved chemical mapping. Existing technologies continue to encounter certain limitations. Conventional spectroscopy captures limited molecular information per data point, insufficient for complex component characterization. This limitation renders it challenging to provide a comprehensive characterization of complex components. The utilization of single spectral data invariably results in suboptimal model generalization capability, thereby compromising the precision of concurrent detection of multiple indicators (Liang et al., 2023). Concurrently, within the manufacturing facility, factors such as tobacco humidity, granularity variations, equipment vibration, and fluctuations in environmental temperature and humidity levels contribute to the introduction of spectral noise, thereby diminishing the signal-to-noise ratio (Tiplady et al., 2019). The existing algorithms have been demonstrated to lack sufficient robustness in the presence of strong noise backgrounds, resulting in substantial fluctuations in detection outcomes. The impact of data processing, spectral processing denoising, baseline correction, and other steps on model performance is significant. The selection of existing methods is largely dependent on the experience of experts and lacks adaptive optimization strategies, which can lead to overfitting or information loss (Shi et al., 2023).

Table 1
www.frontiersin.org

Table 1. Summary of prior studies on spectral analysis for tobacco.

To address the aforementioned challenges, this study proposes a hyperspectral imaging-based framework for quantitative analysis of blended tobacco composition. The framework is centered around two core scientific problems: improving the accuracy of simultaneous multi-component analysis and enhancing anti-interference capability in complex industrial environments. Its key innovations include a feature-level fusion strategy for complementary visible-near infrared (Vis-NIR) and near infrared (NIR) spectra data and an anti-interference modeling method that integrates the minimum covariance determinant (MCD) anomaly detection algorithm with partial least squares regression (PLSR). Experimental results demonstrate that the proposed spectral fusion modeling approach improves the prediction accuracy of tobacco silk components by 14.83% compared to single-spectral methods, achieving an R2 value of 0.8873. This method not only significantly enhances data processing efficiency and reproducibility but also provides a new perspective for intelligent quality detection in the tobacco industry, thereby supporting its transition toward digitalization and automation.

2 Materials and methods

2.1 Experimental materials

The experimental samples utilized in this study were provided by China Tobacco Sichuan Industrial Co., Ltd., and consisted of four tobacco components with divergent physicochemical properties, including (1) tobacco silk (60-100%); (2) cut stem (0-20%); (3) fermented cut stem (0-20%); and (4) expanded tobacco silk (0-20%). These components were obtained directly from the production line, with their respective moisture contents measured as follows: approximately 13.3% for tobacco silk, 12.5% for cut stem, 12.5% for fermented cut stem, and in the range of 12.0–12.6% for expanded tobacco silk. A total of 30 distinct blended tobacco samples were meticulously prepared, each with a standard mass of 20 g. To ensure diversity and representativeness, the specific blending proportions were determined using a constrained random number generation algorithm. The algorithm was designed to independently generate the proportion of each component within its respective allowable range while constraining the sum of all components to exactly 100%. This approach ensured uniform coverage across the defined proportion space and prevented over-representation of extreme or central proportion combinations. The resulting distribution of the four components is illustrated in Figure 1. During sample preparation, each component was meticulously weighed using a precision electronic balance.

Figure 1
 Four bar charts display proportions for different samples. (A): Blue bars, mean 72.8%, standard deviation 7.1%, range [60.8%, 89.2%]. (B): Orange bars, mean 8.9%, standard deviation 5.6%, range [0.0%, 17.6%]. (C): Green bars, mean 8.5%, standard deviation 5.0%, range [0.1%, 18.6%]. (D): Purple bars, mean 9.8%, standard deviation 5.7%, range [0.1%, 19.8%]. Each chart has proportions on the y-axis and samples on the x-axis.

Figure 1. Distribution of four tobacco components: (A) tobacco silk proportion distribution; (B) cut stem proportion distribution; (C) fermented cut stem proportion distribution; (D) expanded tobacco silk. In all subplots, the x-axis corresponds to sample ID (1–30), and the y-axis represents the proportion of each component within the corresponding sample (%).

The blended tobacco was accurately weighed and poured into a box, shaken, thoroughly mixed, and then evenly laid on black A4 cardboard for hyperspectral scanning. To ensure consistency, each sample was carefully spread to form a thin layer uniform thickness without exposing the substrate. To minimize the effects of mixing inhomogeneity and spatial distribution variations, each proportion was subjected to triple validation: after each scanning cycle, the sample was collected, remixed, and re-spread uniformly before the next scan. This process was repeated three times per blending ratio, resulting in a total of 90 samples for subsequent processing and analysis.

2.2 Data acquisition

A hyperspectral imaging system was utilized in the study to collect spectral data of blended tobacco with varying blending ratios. The system comprises a visible near-infrared hyperspectral imager, a near-infrared hyperspectral imager, a CCD camera, optical lenses, two 150W tungsten-halogen line light sources, an electronically controlled mobile platform, and a computerized control terminal. The visible near-infrared imaging spectrometer is the ImSpectorV10E model (Specim, Spectral Imaging Ltd., Oulu, Finland), which is capable of acquiring hyperspectral images within the 400–1000 nm band spectrum, with a spectral resolution of 2.8 nm. For Vis-NIR acquisition, the translation stage speed is set to 7.5 mm/s, the camera exposure time is 21.231 ms, and the working distance between the sample and lens is 37.1 mm. The near-infrared imaging spectrometer is the ImSpector N17E model (Specim, Spectral Imaging Ltd., Oulu, Finland), operating in the range of 900–1700 nm band with a spectral resolution of 5 nm. For NIR acquisition, the translation stage speed is 9.84 mm/s, the exposure time is 30 ms, and the working distance is 32.7 mm. In order to mitigate the impact of external environmental light sources on the spectral acquisition process, a black box is positioned externally within the spectral imaging acquisition instrument and the supporting equipment. When the system works, the computer control terminal initiates the startup of each system component. The motor drives the samples along the conveyor belt, the imaging spectrometer with the camera and lens acquires hyperspectral images of the samples, and the acquired image data is transmitted to the computer control terminal, thereby completing the data acquisition process. The captured image is shown in Figure 2.

Figure 2
Three images labeled (A), (B), and (C) demonstrate the imaging effects of tobacco across different spectral bands.  (A) is an RGB color image, appearing predominantly dark brown; (B) is a Vis-NIR image, displaying a mixture of brown and lighter-toned areas; (C) is a NIR image, exhibiting an overall color tone ranging from light gray to white.

Figure 2. Representative tobacco images under different imaging modalities: (A) RGB image; (B) Vis-NIR image; (C) NIRimage.

Significant noise is introduced due to light intensity fluctuation and sensor dark current during the acquisition process, which leads to degradation of hyperspectral image quality and directly affects data accuracy. To address this issue, this study used a black-and-white correction algorithm to process the raw data. The calibration was systematically performed before each measurement session using a high-reflectivity white reference board and a dark background image captured with the lens covered. This procedure was repeated every 30 minutes throughout the acquisition process to maintain calibration stability. The equation is shown in Equation 1:

R=(IrawIdark)/(IrefIdark)(1)

where R means the corrected sample image; Iraw stands for the original image of the sample; Idark denotes the dark background correction image; and Iref represents the whiteboard correction image.

2.3 Methods of analysis

2.3.1 PLSR regression modeling process

PLSR represents a robust multivariate analysis method particularly suited for high-dimensional spectral data analysis. Its demonstrated efficacy in small-sample, high-dimensional scenarios has led to widespread adoption in agricultural and food science applications of near-infrared spectroscopy (Arnese-Feffin et al., 2024; Li et al., 2024b; Zheng et al., 2024). The method simultaneously decomposes the predictor variable matrix X and the response variable matrix Y into a series of orthogonal latent variables (LVs), which are linear combinations of the original variables. The extraction process is based on the criterion of maximizing the covariance between X and Y (Fernández-Habas et al., 2022; Li et al., 2024c). In order to achieve the objective of specific modeling, the raw spectral data must first undergo a series of processing steps. The first step is to preprocess the spectra to eliminate background noise and baseline drift. Subsequently, the optimal number of LVs was subsequently determined using Leave-One-Out Cross-Validation (LOOCV), with the first local minimum of the Root Mean Square Error of Cross-Validation (RMSECV) as the selection criterion (Deng et al., 2024).

In order to enhance the model’s performance, Spectral preprocessing methods, including Savitzky-Golay smoothing (SG), derivative, standard normal variate (SNV), and multiplicative scatter correction (MSC), were employed to eliminate scattering effects, noise interference, and baseline drift. This process could enhance the prediction accuracy and model robustness of the regression task (Zhu et al., 2025). For different spectral data and target variables, multiple preprocessing methods were compared to determine the optimal approach for each case. In order to enhance the prediction accuracy, robustness, and interpretability of the PLSR model, a variety of wavelength selection methods were employed after spectral preprocessing. These include Variable Importance in Projection (VIP), Successive Projections Algorithm (SPA), and Competitive Adaptive Reweighted Sampling (CARS), which are utilized to identify a limited number of key wavelengths that exhibit a strong correlation with the target variables (Xie et al., 2022b). The implementation of these methods ensures the elimination of wavelengths with a low signal-to-noise ratio or those that are irrelevant to the target variables. This process also serves to mitigate the impact of redundant wavelengths, thereby enhancing the overall performance of the model. This step offers a dual benefit of enhancing prediction accuracy and facilitating physical interpretability for quantitative spectral analysis.

To rigorously evaluate model performance and generalizability within the available dataset, a full cross-validation procedure was adopted for all PLSR models, thereby forgoing the use of a separate hold-out test set. This approach is standard in spectroscopic calibration when sample availability is constrained, as it allows for a robust estimation of predictive accuracy by iteratively using the entire dataset for both training and validation.

2.3.2 Outlier detection

The presence of outliers can have multiple negative impacts on statistical analysis and regression modeling, including but not limited to: undermining the assumption of sample normality, increasing the variability of results and estimation errors, reducing the efficacy of hypothesis testing, introducing systematic bias, and weakening the predictive accuracy of the model (Zheng et al., 2024). Therefore, when constructing multiple regression models, the identification and treatment of outliers is a critical aspect of ensuring reliable data.

In the context of PLSR modeling, the identification of outliers constitutes a pivotal step in ensuring the robustness of the model. In this study, MCD algorithm was employed for outlier detection. As a robust statistical method, MCD operates by identifying the optimal subset of h samples (where h = 0.75n from total n samples) that minimizes the determinant of their covariance matrix (Huang et al., 2024). This approach effectively captures the data’s core distribution while providing robust estimates of both location parameters and scatter matrix, making it particularly suitable for detecting anomalies in complex spectral datasets. The selected subset represents the most tightly clustered portion of the data, with observations deviating from this core group being reliably identified as outliers (Hussain et al., 2024).

In addition, since the MCD algorithm is most effective for low-dimensional data, we first applied PCA to reduce the dimensionality of the spectral data (Alrawashdeh et al., 2024). This step ensures that the number of samples in the input matrix exceeds the number of variables, making MCD-based outlier detection feasible. The variance explained ratio in PCA is a pivotal indicator of the proportion of original data retained by each principal component. The sum of the variance explained ratios of the first m principal components is denoted as the cumulative variance explained ratio, which reflects the aggregate proportion of original data retained.

2.3.3 Evaluation indicators

In order to assess the predictive performance of the established model, the following two key indicators were used for quantitative analysis. Coefficient of Determination (R2) was used to measure the model’s ability to explain the variation in the data, with a value ranging from 0 to 1. The closer R2 is to 1, the better the model is fitted. Root Mean Square Error (RMSE) was used to assess the prediction accuracy of the model, calculating the deviation between the predicted value and the measured value, the smaller the RMSE is, the better the prediction ability of the model is. R² is given by Equation 2, and RMSE is defined by Equation 3:

R2=1i=1n(yiy^i)2i=1n(yiy¯)2(2)
RMSE=1ni=1n(yiy^i)2(3)

where y is the concentration value attribute of the sample, y^ is the concentration value attribute predicted by the model, and y¯ is the mean value of the concentration value attribute of the sample.

3 Results and discussion

3.1 Raw spectral modeling results

3.1.1 Raw spectra of Vis-NIR and NIR

To mitigate instrument noise, optimal spectral intervals were selected from the independently acquired datasets: the Vis-NIR spectra (460–1020 nm, 400 wavelengths) and the NIR spectra (975–1713 nm, 220 wavelengths). The final experimental data comprise two matrices of dimensions 400 × 90 and 220 × 90, each containing 90 samples. The reflectance spectra of tobacco samples, acquired from Vis-NIR and NIR ranges, are presented in Figure 3. In addition, although the overlapping region (975–1020 nm) shows noticeable reflectance differences between the Vis–NIR and NIR instruments due to their distinct detector responses, this region was retained in the subsequent wavelength-selection stage so that the AutoPF framework (Chen et al., 2025) and its data-driven algorithms could objectively determine whether these wavelengths contain useful information.

Figure 3
Line graph showing reflectance versus wavelength from 400 to 1800 nanometers. Multiple colored lines represent different data sets with a general upward trend, a gap between 900 and 970 nanometers, and fluctuation beyond 1000 nanometers.

Figure 3. The original Vis-NIR spectra (460–1020 nm) and NIR spectra (975–1713 nm).

The spectral curve of tobacco exhibits characteristic changes. The Vis-NIR spectrum of tobacco shows an overall increase in reflectance. A key absorption valley at 650–700 nm is due to chlorophyll absorbing red light. In the NIR region, the spectrum fluctuates downward. The distinct absorption trough at 1450–1500 nm corresponds to water and cellulose (O-H and C-H vibrations). The characteristic peak at 1650–1700 nm mainly reflects the presence of nicotine (N-H vibration) and polyphenolic compounds (Workman and Weyer, 2007). These specific wavelength bands effectively map onto the chemical fingerprints of tobacco’s major components—chlorophyll, water, cellulose, nicotine, and polyphenols. Therefore, they serve as important indicators for analyzing tobacco composition and differentiating blending materials.

3.1.2 Results of quantitative analysis of tobacco blending components

Spectral fusion of Vis-NIR and NIR can provide more comprehensive and complementary information. Vis-NIR mainly contains color information and other features related to electron jumps, while NIR is more related to the octave and merge frequencies of molecular vibrations. Fusing the two can provide more comprehensive information about the substance, especially when the substance has both color changes and chemical composition changes. In quantitative analysis, the use of spectral fusion strategy can enhance the model robustness. A single spectrum may be more affected by noise or interference in some cases, while the fused data can reduce this effect by the information of multiple ranges, and improve the stability and prediction ability of the model.

In this study, Vis-NIR and NIR and their fusion spectra were used to quantitatively analyze the four adulterant components in tobacco samples. The regression results of the PLSR model are shown in Table 2. For the quantitative analysis of adulterants in tobacco samples, namely tobacco silk, fermented cut stem, and expanded tobacco silk, the modeling accuracy of Vis-NIR and NIR fusion spectroscopy is higher than that of Vis-NIR or NIR alone, with R2 values of 0.7727, 0.7046, and 0.6676, respectively. This suggests that the prediction accuracies of quantitative analyses can be enhanced by fusing Vis-NIR and NIR through the multidimensional complementary information and the synergistic cross-range features. In the context of quantitative analysis of cut stem, a notable phenomenon emerges: Vis-NIR accuracy exhibits a marginal superiority over fused spectra. This observation can be attributed to the diminished correlation between the target variable, cut stem, and NIR information. Consequently, extraneous noise variables are introduced into the fused spectra, leading to a reduction in prediction accuracy when compared to the accuracy of single Vis-NIR.

Table 2
www.frontiersin.org

Table 2. Results of PLSR models for the original spectra.

3.2 Rejects anomalous samples by MCD

3.2.1 Characteristics of the distribution of anomalous samples

In this study, PCA was applied to reduce the dimensionality of the spectral data before performing outlier detection using the MCD algorithm. The selection of principal components was based on cumulative variance contribution. The first two principal components (PC1-PC2) account for more than 97% of the total variance for all three spectral datasets (99.67% for Vis-NIR, 99.10% for NIR, and 97.68% for the fusion spectra of Vis-NIR and NIR). The variance contributions of PC3-PC5 are all below 2%, indicating that these components mainly reflect noise or minor fluctuations. Therefore, retaining only the first two principal components ensures that almost all meaningful spectral information is preserved while avoiding noise-dominated redundant components that may reduce the robustness of MCD. Moreover, MCD achieves more stable covariance estimation in a lower-dimensional space, and the rapid decline in variance contribution beyond PC2 provides a consistent criterion for the three spectral datasets.

For the Vis–NIR dataset, the cumulative variance explained by PC1-PC2 reached 99.67%, confirming that these components retain nearly all of the original spectral information. The results of Vis-NIR using MCD to reject abnormal samples are shown in Figure 4, where the red ellipse in Figure 4A represents the 97.5% tolerance ellipse. From Figure 4A, it can be seen that five observations were excluded from the tolerance ellipse, with samples 43, 44, and 45 having data that are clearly different from the distribution of the overall sample, and these samples can affect the precision of subsequent quantitative analyses. Similarly, it can be seen in Figure 4B that the distance of these five observations to the robust subset in the MCD was greater than the cutoff value, and thus they were judged to be anomalous samples to be rejected and not involved in the subsequent quantitative analysis.

Figure 4
Scatter plots in three rows, each with two plots on multivariate analysis. The left plots (A, C, E) show data points with 97.5% tolerance ellipses labeled X1 and X2. Points 43, 44, and 45 in A; 75, 78, and 87 in C; 33, 43, and 47 in E are notable. The right plots (B, D, F) display robust distances against an index, with a red horizontal line marking a threshold. Points above the line include 43, 44, and 45 in B; 75, 78, and 87 in D; 33 and 47 in F.

Figure 4. Results in MCD: (A) Data distribution of Vis-NIR dataset; (B) Robust distance of Vis-NIR dataset; (C) Data distribution of NIR dataset; (D) Robust distance of NIR dataset; (E) Data distribution of fusion spectra dataset; (F) Robust distance of fusion spectra dataset. The red ellipse in (A, C, E) represents the 97.5% tolerance ellipse, and the red horizontal line in (B, D, F) is the threshold for determining outliers. Points that fall outside the ellipse or above the red line will be marked as outliers.

For the NIR spectra dataset, after PCA applied, the cumulative variance explained by the first two principal components reaches 99.10%, so the same first two principal components of the NIR spectra can be used for MCD anomaly sample rejection. In Figure 4C, there are 6 observations that fall outside the tolerance ellipse and are considered to be inconsistent with the distribution of the overall sample. And similarly, these 6 observations were considered to be anomalous samples in Figure 4D as their robust distances were larger than the cutoff value. These anomalous samples were removed to improve the accuracy of the subsequent quantitative analysis.

The fusion spectra dataset of Vis-NIR and NIR were subjected to PCA downscaling, and the cumulative variance explained by the first two principal components amounted to 97.68%. And similarly, these two principal components were used for MCD to reject the anomalous samples. As shown in Figure 4E, all observations fall within the tolerance ellipse, which indicates that for the fusion spectra of Vis-NIR and NIR, all samples come from the same distribution and there are no anomalous samples. It can also be seen from Figure 4F that the distances of all samples to the robust subset in the MCD are less than the cutoff value. Thus all 90 samples were used for subsequent quantitative analysis.

3.2.2 Quantitative analysis results after removal of outliers

The MCD removal of anomalous samples was performed on Vis-NIR dataset and NIR dataset, and five and six spectral data were removed, leaving the number of samples at 85 and 84, respectively. The fusion spectra of Vis-NIR and NIR did not have any obvious anomalous samples, so the number of samples remained at 90. The modeling results after using MCD for the raw spectra to remove outliers are shown in Table 3.

Table 3
www.frontiersin.org

Table 3. Results of PLSR models after MCD eliminates abnormal samples.

The regression results of the PLSR model were improved after the five anomalous samples in Vis-NIR dataset were excluded, and when the target variable was expanded tobacco silk, the R2 was improved from 0.4644 to 0.5574, and the regression accuracy was improved by 20.03%, and the prediction results of the PLSR model did not change much when the target variables were tobacco silk, cut stem, and fermented cut stem. This may be due to the presence of anomalous samples in spectral bands that are not related to these three target variables, which depend on other stable bands, resulting in little change in the accuracy of the PLSR model after the removal of the anomalous samples. The six anomalous samples in the NIR were removed from the MCD and modeled using the remaining spectral data, and for each of the four blending components, the accuracy of the PLSR model constructed in the NIR increased compared to that before the removal of the anomalous samples. For the target variable tobacco silk, the R2 of the NIR-PLSR model was improved from 0.7717 to 0.8143, and the prediction accuracy was higher than that of the PLSR model constructed by Vis-NIR and fusion spectroscopy. For the other three target variables, the accuracy of the NIR-PLSR model after removing the abnormal samples, although improved, was still not as good as the prediction accuracy of the PLSR model constructed by fusion spectroscopy. For the fusion spectra of Vis-NIR and NIR dataset, the performance of the PLSR model for the fusion spectra in Table 3 is consistent with that in Table 2 because the MCD does not exclude the anomalous samples. Overall, when the target variable was tobacco silk, the modeling of NIR was better than Vis-NIR and the fusion spectra of Vis-NIR and NIR; when the target variables were cut stem, fermented cut stem, and expanded tobacco silk, the modeling accuracies of the fusion spectra of Vis-NIR and NIR were the best among the three spectra.

3.3 Spectral data processing

3.3.1 Spectral processing results

The feasibility of enhancing model performance through spectral processing was further investigated. In this study, a range of processing methods provided by the AutoPF framework were implemented on the spectral data for various tobacco components detection to attain optimal PLSR regression outcomes. Figure 5 presents the processed Vis-NIR spectra when four different tobacco adulterants were used as target variables. The Vis-NIR spectral data were preprocessed using moving average smoothing, a moving average smoothing method that filters out high-frequency noise fluctuations in Vis-NIR by calculating the mean value of the spectral points within a window. By comparing different spectral feature wavelength selection methods, CARS was used to select spectral feature wavelengths in Vis-NIR, and wavelengths with strong correlation with the target variables were filtered to improve the model performance. The blue lower triangles in Figure 5 indicate the spectral wavelength points selected by CARS. When the target variable was tobacco silk (Figure 5A), CARS selected 37 of the 400 wavelength points in Vis-NIR, and the spectral wavelength range was concentrated at 896–1018 nm, which indicated that this spectral wavelength region had a strong correlation with tobacco silk. Similarly, when the target variables were cut stem (Figure 5B) and expanded tobacco silk (Figure 5D), CARS selected 88 wavelength points in Vis-NIR, and the strong correlation spectral wavelength ranges of cut stem were mainly concentrated in the regions of 636.18-773.49 nm and 892–951 nm, while the strong correlation spectral wavelength ranges of expanded tobacco silk were concentrated in the region of 461–595 nm, 461–595 nm, 634–739 nm and 978–1020 nm regions. In Figure 5C, it can be seen that the distribution of the 51 wavelength points selected by CARS for the target variable of fermented cut stem is very uniform. This may be related to the mechanism of the CARS algorithm, where multiple random samples in CARS result in the differences in wavelength distributions retained in different subsets being averaged out. And at the same time, the adjacent wavelengths of Vis-NIR usually have a high degree of multiple covariance, and CARS may retain multiple wavelengths with multicollinearity wavelengths to compensate for the loss of information, so that the selected wavelength points are evenly distributed in the spectrum.

Figure 5
Graphical representation of reflectance versus wavelength for different variables in four panels labeled A, B, C, and D. Each plot shows multiple curves indicating reflectance levels from 400 to 1100 nanometers, with selected variables marked by blue triangles. Reflectance generally increases across the spectrum in all graphs.

Figure 5. Vis-NIR processed with different blending components: (A) tobacco silk; (B) cut stem; (C) fermented cut stem; (D) expanded tobacco silk. The x-axis represents wavelength (nm) and the y-axis indicates reflectance.

Figure 6 shows the NIR after processing for different blending components. When the blending components tobacco silk (Figure 6A) and cut stem (Figure 6B) were used as the target variables, moving average smoothing and first-order derivative were used as the spectral preprocessing combinations. The first-order derivative effectively separates the overlapping peaks and highlights the weakly absorbing peaks by magnifying the change in slope of the spectral curves. In Figure 6C, when the target variable is fermented cut stem, SG smoothing was used as the spectral preprocessing method, and SG smoothing retained the peak shape and amplitude characteristics of the spectra while denoising by polynomially fitting the data points within the window. When the target variable was expanded tobacco silk, as shown in Figure 6D, the best regression accuracy could be obtained without the preprocessing method, and thus no preprocessing was performed on the NIR. Subsequently, the 220 spectral wavelength points of NIR were selected for the characteristic wavelengths. For the target variable cut stem (Figure 6B), the regression accuracy of the PLSR model was not improved after variable selection, so the full spectrum was modeled using the full spectrum without any variable selection method. After comparing different variable selection methods, CARS was used as the spectral characteristic wavelength selection method for NIR for different blending components. When the target variable was tobacco silk, as shown in Figure A, the 43 wavelength points selected by CARS were mainly concentrated in 1271–1497 nm, which indicated that the wavelength points in this region were strongly correlated with tobacco silk. In Figure 6C and Figure 6D, it can be seen that the 39 and 93 wavelength points selected by CARS are uniformly distributed in the spectral wavelengths. It is worth noting that when the target variable is fermented cut stem (Figure 6C), CARS did not select the wavelength points in the region of 1350–1590 nm, which indicates that the wavelength points in this region of the spectrum are irrelevant wavelengths in predicting fermented cut stem, and do not contain valid information.

Figure 6
Four line plots labeled A, B, C, and D show reflectance against wavelength from nine hundred to eighteen hundred nanometers. Plots A and B display similar patterns with negative reflectance values, while C and D show positive reflectance values. All plots include selected variable markers indicated by blue triangles above the data lines.

Figure 6. NIR processed with different blending components: (A) tobacco silk; (B) cut stem; (C) fermented cut stem; (D) expanded tobacco silk. The x-axis represents wavelength (nm) and the y-axis indicates reflectance.

For the fused spectra of Vis-NIR and NIR, feature wavelength selection was performed after preprocessing to enhance model performance as well. In this study, the full spectra of Vis-NIR and NIR were used for spectral fusion, and due to the overlapping bands of Vis-NIR and NIR, variables were used to replace the spectral wavelength points in Figure 7, and the total number of variables for the fused spectra was 620.

Figure 7
Four line graphs labeled A, B, C, and D display reflectance against variables from zero to seven hundred. Each graph shows multiple lines with similar patterns and blue triangles indicating selected variables. The graphs demonstrate variations in reflectance with noticeable peaks around the four hundred mark.

Figure 7. Vis-NIR and NIR fusion spectra processed with different blending components: (A) tobacco silk; (B) cut stem; (C) fermented cut stem; (D) expanded tobacco silk. The x-axis represents wavelength (nm) and the y-axis indicates reflectance.

When the target variable was tobacco silk, the spectra were preprocessed using SG smoothing and MSC to obtain the spectral data in Figure 7A. By establishing a linear relationship between the sample spectrum and the reference spectrum, MSC corrects for multiplicative scattering interferences caused by uneven distribution of particles or differences in optical ranges, and can eliminate baseline shifts due to variations in the intensity of the light source or differences in the thickness of the sample. The preprocessing methods were selected based on the prediction accuracy of the model under different preprocessing methods, and the spectral data were preprocessed using moving average smoothing when the target variables were cut stem and expanded tobacco silk, whereas SG smoothing was used as the spectral preprocessing method when the target variable was fermented cut stem. After comparing the different variable selection methods, CARS was used as the feature wavelength selection method for the fusion spectra of Vis-NIR and NIR. In Figure 7A, it can be seen that the 135 variables selected by CARS were more evenly distributed, and some wavelength points in both Vis-NIR and NIR were selected for subsequent modeling. In contrast, in Figure 7B and Figure 7C, CARS did not select variables in the interval [570, 620], and these unselected variables corresponded to 1538–1713 nm in NIR. However, when performing variable selection in NIR only, CARS chose wavelength points in this region to participate in the modeling, which may be due to the fact that some of the bands in Vis-NIR may carry information more directly related to the targeted variables more directly relevant information, leading CARS to prioritize these bands over the 1538–1713 nm band in the fusion spectra of Vis-NIR and NIR. When the target variables were cut stem and fermented cut stem, CARS selected 95 and 60 variables in the fusion spectra of Vis-NIR and NIR, respectively. In Figure 7D, when the target variable was expanded tobacco silk, CARS selected 75 out of 620 variables, and the selected variables were mainly concentrated in the intervals [21, 46], [127, 203], [490, 539], and [585, 614], which corresponded to the Vis-NIR’s 489–524 nm and 637–744 nm for Vis-NIR and 1275–1440 nm and 1595–1693 nm for NIR, which indicates that CARS well combines the information from the fusion spectra of Vis-NIR and NIR for subsequent quantitative analysis.

The enhanced performance of the fusion model stems from the complementary information provided by Vis-NIR and NIR spectroscopy. Vis-NIR spectra (400–1000 nm) are primarily sensitive to electronic transitions, offering information related to colorants, pigments, and certain organic functional groups. In contrast, NIR spectra (1000–1700 nm) are dominated by overtone and combination vibrations of fundamental molecular bonds, providing direct insights into the molecular composition, such as moisture, cellulose, nicotine, and other organic constituents. This complementarity is clearly reflected in the feature selection results and detection outcomes. For instance, in the quantitative analysis of fermented cut stem and expanded tobacco silk, the fusion model achieved higher accuracy compared to single-spectrum models. The improvement can be attributed to the fact that these components involve both physical–structural attributes and chemical composition differences. Vis-NIR contributes to correcting scatter effects and detecting color-related changes, while NIR captures detailed chemical vibrations. The fusion strategy thus enables a more holistic characterization, effectively suppressing noise, enhancing feature robustness, and improving predictive reliability—particularly for components with complex or overlapping spectral signatures.

3.3.2 Quantitative analysis results after spectral processing

After outlier removal and processing of Vis-NIR, NIR, and fusion spectra of Vis-NIR and NIR, the prediction results of the PLSR model are shown in Table 4.

Table 4
www.frontiersin.org

Table 4. Results of PLSR models after spectral processing.

Comparing the R2 and RMSECV in Tables 3, 4, it can be seen that the spectral processing played a great role in effectively improving the prediction accuracy of the PLSR model. In Table 4, when the target variable was tobacco silk, the fusion spectral modeling of Vis-NIR and NIR had the highest prediction accuracy, and the R2 was improved from 0.7727 of the original spectra to 0.8873, which was 14.83%. The RMSECV decreased from 0.0334 to 0.0236, representing a relative reduction of 29.34%. Similarly, when the target variable was cut stem, the R2 value in the original spectrum increased from 0.5774 to 0.6279, an improvement of 8.75%, and the RMSECV decreased from 0.0361 to 0.0339. This indicates that the fusion of Vis-NIR and NIR spectra significantly improved the performance of quantitative analysis, outperforming single-spectrum models. When the target variables was fermented cut stem and expanded tobacco silk, the NIR model achieved the highest prediction accuracy. The R2 values increased from 0.6567 and 0.6120 in the original spectra to 0.8206 and 0.7303, representing improvements of 24.96% and 19.33%, respectively. The RMSECV decreased from 0.0286 and 0.0349 to 0.0200 and 0.0286, representing relative reductions of 30.07% and 18.05%, respectively. For the blending components fermented cut stem and expanded tobacco silk, the modeling accuracies of the fused spectra were slightly lower than those achieved by the NIR model alone. This phenomenon may be attributed to two main factors. First, the spectral response intensity of cut stem in the NIR region is inherently weaker than in the Vis-NIR range, which reduces its contribution to the fused feature set. Second, the overlapping spectral bands between Vis-NIR and NIR in the 975–1020 nm region carry similar chemical information but with different signal-to-noise ratios, which may introduce redundant or conflicting spectral features after fusion, thereby hindering the model’s ability to extract key discriminative signals.

Compared to previous studies, the present work employed hyperspectral imaging technology, which allows for the flexible selection of ROI and enables the fusion of Vis-NIR and NIR spectra. This approach provides a more tailored and potentially more accurate detection method for tobacco shred analysis.

To address this issue and improve fusion performance, several targeted strategies could be explored in future work. For instance, weighted band fusion could be applied to adjust the contribution of NIR features based on their relevance to the target component, rather than treating all wavelengths equally. Alternatively, targeted band screening could be implemented prior to fusion, such as removing the overlapping NIR region or applying segmental noise filtering to the NIR spectrum, to enhance the signal quality and relevance of the fused spectral dataset.

4 Conclusion

This study successfully demonstrates that feature-level fusion of Vis-NIR and NIR spectroscopy offers a substantial improvement over single-spectrum methods for predicting the composition content of finished tobacco products. The fused-spectrum PLSR model achieved significantly higher predictive accuracy, with R² increasing from 0.7727 to 0.8873—a relative improvement of 14.83%. This enhancement stems from the complementary information captured by each spectral range. Through feature optimization, the fusion process also effectively suppressed spectral noise and improved model robustness, confirming the clear advantage of multi-spectral analysis for this complex matrix.

These findings carry both theoretical and practical significance. Theoretically, they validate spectral fusion as an effective strategy to overcome the inherent limitations of individual spectroscopic techniques. Practically, the developed model provides the tobacco industry with a rapid, non-destructive tool for precise blend monitoring, which is crucial for maintaining consistent product quality and optimal combustion characteristics.

It is important to acknowledge certain limitations of the current work. The model was developed using a limited sample set from a specific production context, and its evaluation relied on cross-validation rather than external testing, which may affect its generalizability to other tobacco varieties or production batches. Furthermore, spectral acquisition was conducted offline under controlled conditions; implementing this method for real-time, in-line monitoring would require substantial engineering optimization to address challenges such as varying sample presentation and environmental interference on active production lines.

Future research should therefore focus on validating the model with larger and more diverse external sample sets, optimizing the computational workflow for real-time industrial deployment, and exploring the integration of additional spectroscopic modalities to expand the analytical scope. Ultimately, this work establishes a robust analytical framework that provides both methodological insights and a practical foundation for advancing toward accurate, non-destructive quality monitoring in tobacco production and similar blending processes.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

YJ: Investigation, Visualization, Data curation, Writing – original draft. QX: Writing – original draft, Project administration, Conceptualization, Methodology. XH: Writing – original draft, Validation, Formal Analysis, Software. RG: Investigation, Writing – review & editing. JW: Writing – review & editing, Visualization. XZ: Data curation, Writing – review & editing. YL: Formal Analysis, Writing – review & editing. LL: Validation, Writing – review & editing. XC: Writing – review & editing, Software. JY: Project administration, Supervision, Writing – original draft, Conceptualization. YH: Supervision, Writing – review & editing, Project administration.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This research was supported by China Tobacco Sichuan Industrial Co., Ltd. (BA202501500002, jc202204).

Acknowledgments

The authors would like to acknowledge China Tobacco Sichuan Industrial Co., Ltd. for providing the experimental samples, as well as for the valuable suggestions and support provided during the research process.

Conflict of interest

Authors QX, RG, JW, XZ, YL, LL and JY were employed by the company China Tobacco Sichuan Industrial Co., Ltd.

The remaining author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2025.1736546/full#supplementary-material

References

Ahmed, M., Monjur, O., and Kamruzzaman, M. (2024). Deep learning-based hyperspectral image reconstruction for quality assessment of agro-product. J. Food Eng. 382, 112223. doi: 10.1016/j.jfoodeng.2024.112223

Crossref Full Text | Google Scholar

Alrawashdeh, M. J., Saad, S. A. A., Mohammed, A. M. A., and Alrawashdeh, W. J. A. (2024). Canonical correlation analysis based on robust covariance matrix by using deterministic of minimum covariance determinant. Partial Differential Equations Appl. Mathematics 11, 100820. doi: 10.1016/j.padiff.2024.100820

Crossref Full Text | Google Scholar

Arnese-Feffin, E., Facco, P., Bezzo, F., and Barolo, M. (2024). Systematizing product design by latent-variable modeling – A unifying framework for the formulation and solution of PLS model inversion problems. Chem. Eng. Sci. 299, 120505. doi: 10.1016/j.ces.2024.120505

Crossref Full Text | Google Scholar

Belay, N. F., Busche, S., Manici, V., Shaukat, M., Arndt, S.-O., and Schmidt, C. (2021). Evaluation of Transmission Raman spectroscopy and NIR Hyperspectral Imaging for the assessment of content uniformity in solid oral dosage forms✰. Eur. J. Pharm. Sci. 166, 105963. doi: 10.1016/j.ejps.2021.105963

PubMed Abstract | Crossref Full Text | Google Scholar

Cao, J., Ding, H., Liu, Y., Wang, L., Liu, Q., Yang, G., et al. (2025). NIR spectroscopy and machine learning for tobacco product authentication: A non-destructive approach with broader safety applications. Microchemical J. 219, 116049. doi: 10.1016/j.microc.2025.116049

Crossref Full Text | Google Scholar

Chen, J., Li, Y., He, X., Jiao, F., Xu, M., Hu, B., et al. (2021). Influences of different curing methods on chemical compositions in different types of tobaccos. Ind. Crops Products 167, 113534. doi: 10.1016/j.indcrop.2021.113534

Crossref Full Text | Google Scholar

Chen, X., Xie, Z., Tauler, R., He, Y., Nie, P., Peng, Y., et al. (2025). An automated preprocessing framework for near infrared spectroscopic data. Chemometrics Intelligent Lab. Syst. 267, 105542. doi: 10.1016/j.chemolab.2025.105542

Crossref Full Text | Google Scholar

Deng, G., Li, J., Liu, H., and Wang, Y. (2024). A fast method for predicting adenosine content in porcini mushrooms using Fourier transform near-infrared spectroscopy combined with regression model. LWT 201, 116243. doi: 10.1016/j.lwt.2024.116243

Crossref Full Text | Google Scholar

Fei, Y., Li, Z., Zhu, T., Chen, Z., and Ni, C. (2025). Nondestructive detection and classification of impurities-containing seed cotton based on hyperspectral imaging and one-dimensional convolutional neural network. Digital Commun. Networks 11, 308–316. doi: 10.1016/j.dcan.2024.05.008

Crossref Full Text | Google Scholar

Fernández-Habas, J., Carriere Cañada, M., García Moreno, A. M., Leal-Murillo, J. R., González-Dugo, M. P., Abellanas Oar, B., et al. (2022). Estimating pasture quality of Mediterranean grasslands using hyperspectral narrow bands from field spectroscopy by Random Forest and PLS regressions. Comput. Electron. Agric. 192, 106614. doi: 10.1016/j.compag.2021.106614

Crossref Full Text | Google Scholar

Gao, S. and Xie, W. (2024). SSC and pH prediction and maturity classification of grapes based on hyperspectral imaging. Smart Agric. Technol. 8, 100457. doi: 10.1016/j.atech.2024.100457

Crossref Full Text | Google Scholar

Huang, X., Huang, G., Chen, X., Xie, Z., Ali, S., Chen, X., et al. (2024). An adaptive strategy to improve the partial least squares model via minimum covariance determinant. Chemometrics Intelligent Lab. Syst. 249, 105120. doi: 10.1016/j.chemolab.2024.105120

Crossref Full Text | Google Scholar

Hussain, F., Ali, Y., Li, Y., and Haque, M. M. (2024). Revisiting the hybrid approach of anomaly detection and extreme value theory for estimating pedestrian crashes using traffic conflicts obtained from artificial intelligence-based video analytics. Accident Anal. Prev. 199, 107517. doi: 10.1016/j.aap.2024.107517

PubMed Abstract | Crossref Full Text | Google Scholar

Jia, W., Ferragina, A., Hamill, R., and Koidis, A. (2024). Modelling and numerical methods for identifying low-level adulteration in ground beef using near-infrared hyperspectral imaging (NIR-HSI). Talanta 276, 126199. doi: 10.1016/j.talanta.2024.126199

PubMed Abstract | Crossref Full Text | Google Scholar

Li, J., Ma, Z., Dai, H., Li, H., Qiu, J., and Pang, X. (2024b). Application of PLSR in correlating sensory and chemical properties of middle flue-cured tobacco leaves with honey-sweet and burnt flavour. Heliyon 10, e29547. doi: 10.1016/j.heliyon.2024.e29547

PubMed Abstract | Crossref Full Text | Google Scholar

Li, C., Ma, J., Zhang, E., Du, J., Zhang, L., Zhao, M., et al. (2025). Early detection of mold in cured tobacco leaves based on temporal hyperspectral imaging. Infrared Phys. Technol. 150, 106035. doi: 10.1016/j.infrared.2025.106035

Crossref Full Text | Google Scholar

Li, J., Sun, W., Liu, S., Cheng, T., Tang, L., Jiang, W., et al. (2024c). Estimation of LAI of tobacco plant using selected spectral subsets of visible and near-infrared reflectance spectroscopy. Smart Agric. Technol. 8, 100502. doi: 10.1016/j.atech.2024.100502

Crossref Full Text | Google Scholar

Li, H., Wang, Q., Han, L., Chen, Z., Wang, G., Wang, Q., et al. (2024a). Quality characterization of tobacco flavor and tobacco leaf position identification based on homemade electronic nose. Sci. Rep. 14, 1–12. doi: 10.1038/s41598-024-70180-5

PubMed Abstract | Crossref Full Text | Google Scholar

Liang, J., Wang, Y., Shi, Y., Huang, X., Li, Z., Zhang, X., et al. (2023). Non-destructive discrimination of homochromatic foreign materials in cut tobacco based on VIS-NIR hyperspectral imaging. J. Sci. Food Agric. 103, 4545–4552. doi: 10.1002/jsfa.12528

PubMed Abstract | Crossref Full Text | Google Scholar

Liang, Y., Zhao, L., Guo, J., Wang, H., Liu, S., Wang, L., et al. (2022). Just-in-time learning-integrated partial least-squares strategy for accurately predicting 71 chemical constituents in Chinese tobacco by near-infrared spectroscopy. ACS Omega 7, 38650–38659. doi: 10.1021/acsomega.2c04139

PubMed Abstract | Crossref Full Text | Google Scholar

Losso, K., Cardini, J., Huber, S., Kappacher, C., Jakschitz, T., Rainer, M., et al. (2022). Rapid differentiation and quality control of tobacco products using Direct Analysis in Real Time Mass Spectrometry and Liquid Chromatography Mass Spectrometry. Talanta 238, 123057. doi: 10.1016/j.talanta.2021.123057

PubMed Abstract | Crossref Full Text | Google Scholar

Medina–García, M., Amigo, J. M., Martínez-Domingo, M. A., Valero, E. M., and Jiménez–Carvelo, A. M. (2025). Strategies for analysing hyperspectral imaging data for food quality and safety issues – A critical review of the last 5 years. Microchemical J. 214, 113994. doi: 10.1016/j.microc.2025.113994

Crossref Full Text | Google Scholar

Mei, J.-F., Li, Z.-H., Li, J.-K., Su, Z.-Q., Xu, Bo, Du, J.-S., et al. (2021). Components discrimination for formula tobacco based on hyperspectral imaging. J. Instrumental Anal. 40, 7. doi: 10.19969/j.fxcsxb.20110702

Crossref Full Text | Google Scholar

Nargesi, M. H., Amiriparian, J., Bagherpour, H., and Kheiralipour, K. (2024). Detection of different adulteration in cinnamon powder using hyperspectral imaging and artificial neural network method. Results Chem. 9, 101644. doi: 10.1016/j.rechem.2024.101644

Crossref Full Text | Google Scholar

Niu, Q., Liu, J., Jin, Y., Chen, X., Zhu, W., and Yuan, Q. (2022). Tobacco shred varieties classification using Multi-Scale-X-ResNet network and machine vision. Front. Plant Sci. 13. doi: 10.3389/fpls.2022.962664

PubMed Abstract | Crossref Full Text | Google Scholar

Özdoğan, G., Lin, X., and Sun, D.-W. (2021). Rapid and noninvasive sensory analyses of food products by hyperspectral imaging: Recent application developments. Trends Food Sci. Technol. 111, 151–165. doi: 10.1016/j.tifs.2021.02.044

Crossref Full Text | Google Scholar

Rocha de Oliveira, R. and de Juan, A. (2021). SWiVIA – Sliding window variographic image analysis for real-time assessment of heterogeneity indices in blending processes monitored with hyperspectral imaging. Analytica Chimica Acta 1180, 338852. doi: 10.1016/j.aca.2021.338852

PubMed Abstract | Crossref Full Text | Google Scholar

Seong, G.-U., Kim, S. S., Yun, D.-Y., Lee, G., Park, S.-K., Lim, J., et al. (2025). Characterization of bioactive compounds in phytophthora blight-infected red pepper powder (Capsicum annuum) and nondestructive discrimination of adulteration ratios using hyperspectral imaging. Food Bioscience 63, 105662. doi: 10.1016/j.fbio.2024.105662

Crossref Full Text | Google Scholar

Shi, T., Gao, Y., Song, J., Ao, M., Hu, X., Yang, W., et al. (2024). Using VIS-NIR hyperspectral imaging and deep learning for non-destructive high-throughput quantification and visualization of nutrients in wheat grains. Food Chem. 461, 140651. doi: 10.1016/j.foodchem.2024.140651

PubMed Abstract | Crossref Full Text | Google Scholar

Shi, X., Song, J., Wang, H., Lv, X., Zhu, Y., Zhang, W., et al. (2023). Improving soil organic matter estimation accuracy by combining optimal spectral preprocessing and feature selection methods based on pXRF and vis-NIR data fusion. Geoderma 430, 116301. doi: 10.1016/j.geoderma.2022.116301

Crossref Full Text | Google Scholar

Shiny, J. J., S, H., A, S., and S., P. Y. (2024). “Food adulterant detection using hyperspectral imaging system in okra,” in 2024 International Conference on Sustainable Communication Networks and Application (ICSCNA) Theni, India. 905–910. doi: 10.1109/ICSCNA63714.2024.10864276

Crossref Full Text | Google Scholar

Tian, X., Aheto, J. H., Bai, J., Dai, C., Ren, Y., and Chang, X. (2021). Quantitative analysis and visualization of moisture and anthocyanins content in purple sweet potato by Vis–NIR hyperspectral imaging. J. Food Process. Preserv. 45. doi: 10.1111/jfpp.15128

Crossref Full Text | Google Scholar

Tiplady, K. M., Sherlock, R. G., Littlejohn, M. D., Pryce, J. E., Davis, S. R., Garrick, D. J., et al. (2019). Strategies for noise reduction and standardization of milk mid-infrared spectra from dairy cattle. J. Dairy Sci. 102, 6357–6372. doi: 10.3168/jds.2018-16144

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, H., Wu, Q., Yang, W., Yu, J., Wu, T., Xiong, Z., et al. (2024). NIR and MIR spectral feature information fusion strategy for multivariate quantitative analysis of tobacco components. Chemometrics Intelligent Lab. Syst. 253, 105222. doi: 10.1016/j.chemolab.2024.105222

Crossref Full Text | Google Scholar

Workman, J., Jr. and Weyer, L. (2007). Practical Guide to Interpretive Near-Infrared Spectroscopy. 0 Edn (Boca Raton: CRC Press). doi: 10.1201/9781420018318

Crossref Full Text | Google Scholar

Wu, R., Tian, Z., Zhang, C., Li, D., Tian, N., Xing, L., et al. (2022). Uniformity evaluation of stem distribution in cut tobacco and single cigarette by near infrared spectroscopy. Vibrational Spectrosc. 121, 103401. doi: 10.1016/j.vibspec.2022.103401

Crossref Full Text | Google Scholar

Xie, Z., Feng, X., and Chen, X. (2022b). Subsampling for partial least-squares regression via an influence function. Knowledge-Based Syst. 245, 108661. doi: 10.1016/j.knosys.2022.108661

Crossref Full Text | Google Scholar

Xie, A., Sun, J., Wang, T., and Liu, Y. (2022a). Visualized detection of quality change of cooked beef with condiments by hyperspectral imaging technique. Food Sci. Biotechnol. 31, 1257–1266. doi: 10.1007/s10068-022-01115-x

PubMed Abstract | Crossref Full Text | Google Scholar

Yao, B., Jiang, W., Yan, D., Zhang, L., Ma, E., Deng, X., et al. (2024). Determining nitrogen status using a universal critical nitrogen dilution curve for predicting flue-cured tobacco yield and quality. J. Agric. Food Res. 18, 101296. doi: 10.1016/j.jafr.2024.101296

Crossref Full Text | Google Scholar

Ye, X., Chen, X., Zhang, J., Cheng, M., Fu, B., and Ai, B. (2025). Polarization standard filtering enables multi-attribute tobacco grading by near-infrared spectroscopy. Analytica Chimica Acta 1377, 344676. doi: 10.1016/j.aca.2025.344676

PubMed Abstract | Crossref Full Text | Google Scholar

Zhao, M., Cang, H., Chen, H., Zhang, C., Yan, T., Zhang, Y., et al. (2023). Determination of quality and maturity of processing tomatoes using near-infrared hyperspectral imaging with interpretable machine learning methods. LWT 183, 114861. doi: 10.1016/j.lwt.2023.114861

Crossref Full Text | Google Scholar

Zheng, R., Jia, Y., Ullagaddi, C., Allen, C., Rausch, K., Singh, V., et al. (2024). Optimizing feature selection with gradient boosting machines in PLS regression for predicting moisture and protein in multi-country corn kernels via NIR spectroscopy. Food Chem. 456, 140062. doi: 10.1016/j.foodchem.2024.140062

PubMed Abstract | Crossref Full Text | Google Scholar

Zhu, H., Lin, C., Dong, Z., Xu, J.-L., and He, Y. (2025). Early yield prediction of oilseed rape using UAV-based hyperspectral imaging combined with machine learning algorithms. Agriculture 15, 1100. doi: 10.3390/agriculture15101100

Crossref Full Text | Google Scholar

Keywords: blending proportions, hyperspectral imaging, multispectral fusion, quantitative analysis, tobacco

Citation: Jiang Y, Xiao Q, Huang X, Gu R, Wen J, Zhang X, Liu Y, Li L, Chen X, Yang J and He Y (2026) Quantitative analysis of tobacco blending proportions based on hyperspectral imaging and data fusion. Front. Plant Sci. 16:1736546. doi: 10.3389/fpls.2025.1736546

Received: 31 October 2025; Accepted: 23 December 2025; Revised: 15 December 2025;
Published: 15 January 2026.

Edited by:

Sergio Ruffo Roberto, State University of Londrina, Brazil

Reviewed by:

Agustami Sitorus, National Research and Innovation Agency (BRIN), Indonesia
Yuanhui Wang, Henan University of Technology, China

Copyright © 2026 Jiang, Xiao, Huang, Gu, Wen, Zhang, Liu, Li, Chen, Yang and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Juan Yang, eWFuZ2p1YW5tYWlsQHNpbmEuY29t

These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.