Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Plant Sci., 05 February 2026

Sec. Technical Advances in Plant Science

Volume 17 - 2026 | https://doi.org/10.3389/fpls.2026.1729375

Novel estimation of tomato soluble solids content using linearly transformed reflectance-based spectral indices

Naisen Liu*Naisen Liu1*Yuan FangYuan Fang1Yongzhen ZhaoYongzhen Zhao1Yuezhen ChenYuezhen Chen2Yangxia ZhengYangxia Zheng3Xuedong ZhaXuedong Zha4Jingyu GuoJingyu Guo1Xia LiXia Li4
  • 1School of Life Science, Huaiyin Normal University, Huai’an, China
  • 2Huai’an Institute of Vegetable Sciences, Huai’an, China
  • 3College of Horticulture, Sichuan Agricultural University, Chengdu, China
  • 4Huai’an Agricultural Information Center, Huai’an, China

Rapid and non-destructive estimation of soluble solids content (SSC) is essential for tomato quality evaluation, yet the generalization ability of many existing spectral models remains limited when applied across multiple cultivars. In this study, hyperspectral reflectance was combined with a genetic algorithm (GA)–based optimization strategy to develop a robust SSC prediction framework applicable to diverse tomato types. Spectral reflectance and SSC (°Brix) were measured for 152 fruits representing 13 cultivars, including large red, medium red, red cherry, and yellow cherry types. To overcome the structural rigidity of conventional fixed-form spectral indices, reflectance spectra were linearly transformed to construct three novel indices: the linearly transformed difference spectral index (ltDSI), linearly transformed normalized difference spectral index (ltNDSI), and linearly transformed ratio spectral index (ltRSI). For each index, GA was employed to simultaneously optimize wavelength combinations and transformation coefficients. Under identical calibration and validation datasets, the GA-optimized indices consistently outperformed conventional two-band spectral indices as well as full-spectrum partial least squares models, while exhibiting markedly reduced sensitivity to tomato type. Across all validation datasets, the proposed models achieved coefficients of determination of approximately 0.80, with root mean square errors around 0.6°Brix and mean relative errors close to 10%. These results demonstrate that joint optimization of spectral index structure and parameters is an effective strategy for improving model robustness and transferability. The proposed framework provides a scalable solution for non-destructive SSC assessment and offers practical guidance for the development of low-cost, field-deployable spectral sensing tools for fruit quality phenotyping across cultivars and growing conditions.

1 Introduction

Tomato (Solanum lycopersicum L.) is one of the most widely cultivated horticultural crops, is valued for its flavor, nutritional composition, and medicinal properties (Friedman, 2013, 2015; Kwon et al., 2024). Among its quality attributes, the soluble solids content (SSC) is a key indicator influencing consumer preference and marketability (Xia et al., 2020). The accurate assessment of SSC is therefore of great significance for crop management (Wang F. et al., 2022), postharvest grading (Tang et al., 2018), consumer choice, and food processing (Kim et al., 2023).

Conventional methods for measuring SSC, such as high- performance liquid chromatography (Ali et al., 2022) and refractometry (Jaywant et al., 2022), provide accurate results. However, their destructive sampling and low measurement efficiency limit their practical use. Spectroscopy has emerged as a powerful nondestructive technique for assessing SSC and other fruit quality attributes (Lu et al., 2000; Rongtong et al., 2018; Li S. et al., 2024). Despite its advantages, several challenges remain. Fruit spectra are strongly affected by fruit size and internal structure, and the generalization ability of existing models across cultivars is often limited. Fruit spectral measurements typically rely on reflectance (Lu, 2001; He et al., 2005) or transmittance (Khuriyati and Matsuoka, 2004; Tian et al., 2007). Transmittance spectroscopy enables deeper light–tissue interactions and can capture SSC information more effectively (Tian et al., 2023). However, its penetration is weak for large fruits (Xu et al., 2024), reducing measurement accuracy. Even in small fruits, variability in internal structure introduces substantial interference. Tomatoes contain complex internal partitions (Khuriyati and Matsuoka, 2004), grapes contain seeds, and peaches have large pits. These structural differences make the optimal measurement position difficult to define, especially because internal features are invisible. Transmittance measurements also require sealed, light-proof configurations (Cai et al., 2024; Zheng et al., 2024). Fruits must be placed in dedicated holders, which limits measurement speed and makes on-plant assessment nearly impossible. By contrast, reflectance spectroscopy captures only surface information and is unaffected by internal structure, but it contains less SSC-related information, imposing higher demands on model construction.

The adaptability of prediction models to different cultivars is another critical concern because it determines their practical utility. Many studies use only one cultivar (He et al., 2005; Radzevičius et al., 2016) or a few cultivars (Li S. et al., 2024), providing limited evidence of model generalization. Choi et al. (2017) reported cross-cultivar transferability in Asian pears, whereas studies on sugarcane (Chea et al., 2020), melon (Kusumiyati et al., 2022), and tomato (Ibáñez et al., 2019) demonstrated cultivar-specific prediction accuracy. These inconsistent findings may result from small sample diversity or differences in cultivar similarity. Evidence from homogenized tomato samples supports this view: models built using puree and juice spectra successfully predicted SSC across 53 genetically diverse cultivars (Sun et al., 2021). This highlights the need for prediction models developed using intact fruits from multiple types and diverse cultivars.

Current modeling approaches for fruit quality prediction rely on spectral indices or machine learning methods such as partial least squares regression (PLS) and random forests (Li H. et al., 2024; Elsayed et al., 2025). Although machine learning models often achieve high accuracy, they typically require computationally intensive processing or numerous spectral bands, which increases hardware cost and limits sensor integration. Spectral-index-based models use fewer bands and are computationally simple, making them more suitable for embedded sensors. However, their accuracy can be limited and must be improved. Conventional two-band indices, such as the difference spectral index (DSI), normalized difference spectral index (NDSI), and ratio spectral index (RSI), remain widely used, but recent evidence indicates that newly designed indices can further enhance SSC prediction (Wang Q. et al., 2022). Selecting sensitive wavelengths from a large spectral range remains challenging because of the vast number of possible band combinations. Genetic algorithms (GA), which simulate evolutionary processes such as selection, crossover, and mutation, provide an effective global optimization strategy (Shahi et al., 2016). GA-based wavelength selection has been successfully applied to improve SSC prediction in multiple studies (Roger and Bellon-Maurel, 2000; Ouyang et al., 2012; Almoujahed et al., 2025).

The objectives of this study were threefold: (i) to construct tomato SSC prediction models using linearly transformed spectral indices optimized by a GA and to compare their performance with models based on conventional two-band indices and PLS; (ii) to assess the extent to which model performance depends on tomato type; and (iii) to generate a virtual spectral reflectance landscape to visualize the distribution patterns of sensitive wavelength combinations. This study proposes a GA-optimized framework based on linearly transformed spectral indices for robust estimation of tomato SSC across multiple tomato types, providing a practical tool for harvest and postharvest quality management.

2 Materials and methods

2.1 Samples

Thirteen fresh-market tomato cultivars were selected: ‘Gaofen No.1’ (a locally recommended cultivar), ‘Mingzhi’, ‘FQ581’, ‘HTO-32’, ‘HTO-40’, ‘HTO-41’, ‘HTO-44’, ‘HTO-47’, ‘Gaotianfen No.7’, ‘C575’, ‘HTO-19’, ‘HHTO-35’, and ‘HHTO-36’. All plants were cultivated in multi-span plastic greenhouse structures with steel frames at the Vegetable Research Institute of Huai’an, Jiangsu Province, China. Seeds were sown on January 8, 2025, and transplanted on March 20, 2025. A randomized block design was adopted. Each cultivar was planted in a single plot measuring 1.8 m × 4.0 m. Plants were grown on flat beds with trellising, with 40 cm spacing between plants and 60 cm between rows. Irrigation and weed control followed standard local management practices. The basal fertilization consisted of 600 kg/ha compound fertilizer (N-P2O5-K2O = 15-15-15; Anhui Huilong Zhongcheng Technology Co., Ltd., Hefei, Anhui, China) and 7500 kg/ha organic fertilizer containing Trichoderma spp. (Huai’an Chaimihe Agricultural Technology Co., Ltd., Huai’an, Jiangsu, China). Additional fertilization included 225 kg/ha compound fertilizer (N-P2O5-K2O = 17-17-17; Shandong Runhe Fertilizer Co., Ltd., Jining, Shandong, China) applied at the fruit enlargement stage of the first truss, 90 kg/ha potassium fulvic acid water-soluble fertilizer (Shandong Runhe Fertilizer Co., Ltd.) applied after fruit set of the second truss, and 75 kg/ha potassium-rich water-soluble fertilizer (Shandong Runhe Fertilizer Co., Ltd.) applied after every one to two harvests. In addition, foliar sprays of KH2PO4 and calcium fertilizer (Anhui Ruilin Modern Agricultural Technology Co., Ltd., Hefei, Anhui, China) were applied once to prevent blossom-end rot.

Fruits were harvested at the full-ripe stage. Six plants were randomly selected from each plot. For large-fruited tomatoes, one to two fruits were collected per plant; for medium-sized and cherry tomatoes, two to three fruits were collected per plant. Cracked fruits were removed after harvest.

2.2 Spectral reflectance and SSC measurement

Spectral reflectance was measured on the day of harvest or the following day using a FieldSpec® 4 spectroradiometer (ASD Inc., Alexandria, VA, USA). The instrument was calibrated using a white reference panel, and the fiber-optic probe was positioned vertically above the equatorial region of the fruit (Figure 1). Ten consecutive measurements were taken per fruit and averaged to obtain a single reflectance spectrum. The fruit was placed on a black concave tray to minimize movement during measurement. All spectral data were collected between 10:00 and 14:00 under clear-sky conditions.

Figure 1
A scientific setup outside, featuring a spectroradiometer mounted on a tripod facing a tomato placed on a black-covered table. Nearby is a bag with field equipment and accessories.

Figure 1. In situ measurement of tomato fruit spectra.

Within one hour after spectral measurement, each fruit was juiced, and SSC (°Brix) was determined using a refractometer. Each fruit was measured three times, and the average was calculated. To evaluate model robustness, the dataset was randomly partitioned 10 times, generating ten pairs of modeling and validation subsets. In each iteration, 75% of the samples were used for model development, and the remaining 25% were reserved for independent validation. These ten randomized splits were denoted as Dataset 1 through Dataset 10.

2.3 Spectral data preprocessing

The FieldSpec® 4 spectroradiometer covers a spectral range of 350–2500 nm, including visible, near-infrared, and shortwave infrared regions. Due to the high correlations among adjacent bands, raw data were averaged every 10 bands, and the midpoint wavelength was recorded. For example, the mean reflectance at 350–359 nm was assigned to 355 nm. Spectral regions strongly affected by water absorption (1345–1425, 1795–1975, and 2345–2500 nm) were removed.

2.4 Construction of linearly transformed spectral indices–based SSC prediction models

Reflectance values at selected bands were subjected to linear transformation and used to construct new two-band spectral indices analogous to the DSI, NDSI, and RSI. These were denoted as ltDSIλ1,λ2, ltNDSIλ1,λ2 and ltRSIλ1,λ2, respectively (where “lt” denotes linearly transformed reflectance). The two wavelengths are represented by λ1 and λ2, and their order is unrestricted. The indices were defined as shown in Equations 13:

ltDSIλ1,λ2=k1·Rλ1k2·Rλ2+b(1)
ltNDSIλ1,λ2=k1·Rλ1k2·Rλ2+b1k1·Rλ1+k2·Rλ2+b2(2)
ltRSIλ1,λ2=k1·Rλ1+b1k2·Rλ2+b2(3)

where Rλ is the reflectance at wavelength λ, and k1, k2, b, b1, and b2 are the coefficients to be optimized. Linear regression models based on these indices were constructed to estimate SSC, and their coefficients of determination (R2) were calculated.

Model construction based on ltDSIλ1,λ2, ltNDSIλ1,λ2, and ltRSIλ1,λ2 required the simultaneous optimization of both band selection and transformation coefficients, representing a high-dimensional, non-deterministic polynomial-time (NP-hard) problem. To address this, a GA was employed in which each candidate solution was represented as a chromosome, and genetic operators, including recombination (crossover), mutation, fitness evaluation, selection, and inter-population migration, were iteratively applied to the population of chromosomes.

For ltDSIλ1,λ2, the coefficients k1, k2, and b were optimized, as represented by the chromosome structure in Supplementary Figure S1A. For ltNDSIλ1,λ2 and ltRSIλ1,λ2, the coefficients k1, k2, b1, and b2 were optimized, as illustrated in Supplementary Figure S1B. The model R2 was used as the fitness function.

The GA adopted a multi-population strategy with 10 subpopulations of 100 individuals each, a migration rate of 0.2, and migration every 10 generations. The maximum generation number (MaxGen) was set to 50, and the initial crossover rate (XOVR) and mutation rate (MUTR) were set to 0.7 and 0.2, respectively, and updated at each generation based on the following rules, as shown in Equations 4, 5:

XOVR=XOVR0.70.3MaxGen(4)
MUTR=MUTR0.20.005MaxGen(5)

This adaptive adjustment strategy was designed to maintain population diversity and improve optimization performance. The SSC prediction models were constructed and validated using the dataset described in Section 2.2.

2.5 Construction of conventional two-band spectral indices–based SSC prediction models

Conventional two-band spectral indices are derived by applying basic arithmetic operations to reflectance values. In this study, three such indices, DSI, NDSI, and RSI, were constructed. For each index, all possible band-pair combinations were computed, and the corresponding R² with SSC were evaluated. The band pairs yielding the highest R² values were then selected to develop the SSC prediction model. For arbitrary band pairs (λ1, λ2; λ1 < λ2), the indices were calculated as follows, as shown in Equations 68:

DSIλ1,λ2=Rλ1Rλ2(6)
NDSIλ1,λ2=Rλ1Rλ2Rλ1+Rλ2(7)
RSIλ1,λ2=Rλ1Rλ2(8)

where Rλ is the reflectance at wavelength λ. Linear regression models were constructed to estimate SSC, and the R2 was calculated. The SSC prediction models were constructed and validated using the dataset 1 described in Section 2.2.

2.6 Construction of full-spectrum PLS SSC prediction model

Partial least squares (PLS) regression is a widely used multivariate modeling technique, particularly suitable for datasets with many predictors, strong collinearity, noise, or limited sample size. It has been extensively applied to Vis/NIR spectroscopy for predicting fruit quality attributes (Zheng et al., 2024). In this study, a full-spectrum PLS model was developed to predict SSC, serving as a baseline for evaluating the performance of the models constructed in Section 2.4. During model development, the optimal number of latent variables (LVs) was determined at the point where the explained variance approached 100% and the residual variance approached zero. The calibration and validation of the PLS model were performed using the dataset described in Section 2.2.

2.7 Model evaluation

The performance of the models was assessed using the following statistical metrics: the coefficients of determination for calibration and prediction (Rc2 and Rp2), the root mean square errors of calibration and prediction (RMSEC and RMSEP), and the mean relative error (MRE). A higher R² indicates better predictive ability, whereas lower RMSE and MRE values are desirable (Li H. et al., 2024). In addition, the coefficient of variation (CV) was used to quantify the variability in SSC across different tomato cultivars. These metrics were calculated as, as shown in Equations 912:

R2=1i=1n(yiy^i)2i=1n(yiy¯i)2(9)
RMSE=1ni=1n(yiy^i)2(10)
MRE=1ni=1n|y^iyiyi|×100(11)
CV=1x¯i=1n(yiy¯)2n1×100(12)

where yi is the observed value; y¯ is the mean of the observed values; y^i is the predicted value; and n is the sample size.

3 Results

3.1 Variability of tomato SSC

The tomato fruits used in this study were classified into four types: large red, medium red, red cherry, and yellow cherry. A total of 152 samples were analyzed, with SSC values ranging from 3.0 to 8.5°Brix (Table 1). Pronounced differences in SSC distribution were observed among the four tomato types. The five large-red varieties exhibited relatively low SSC values, suggesting lower sweetness levels. Their SSC dispersion was limited, with most varieties showing CVs of approximately 4%, except for Mingzhi, which showed a slightly higher CV exceeding 6%. The medium-red group consisted of three varieties and exhibited greater SSC heterogeneity. In particular, HTO-44 showed substantial variability, with a CV exceeding 15%. The cherry-type tomatoes included five varieties and were characterized by generally higher SSC values. However, their variability differed among varieties: three showed moderate dispersion with CVs close to 5%, whereas the remaining two exhibited higher variability, with CVs equal to or greater than 12%. Overall, the wide SSC range and the pronounced variability observed across tomato types and varieties provide a robust data basis for developing reliable SSC prediction models, which is beneficial for improving both model accuracy and generalization performance.

Table 1
www.frontiersin.org

Table 1. Statistical summary of tomato sample types and their soluble solids content.

3.2 Spectral reflectance characteristics of different tomato types

Figure 2 presents the spectral reflectance curves for the four tomato types. In the visible region below 600 nm, reflectance was generally low for all samples. A distinct red-edge feature appeared near 600 nm. Compared with the other tomato types, yellow cherry tomatoes exhibited a noticeable blue shift in the red-edge position. In the 600–1300 nm region, multiple reflectance peaks and troughs were observed, indicating rich spectral features that are potentially informative for SSC modelling. These reflectance features were sensitive to tomato type. In particular, the troughs allowed discrimination among large, medium, and cherry tomatoes, whereas differentiation between red and yellow cherry types was limited due to substantial overlap of their spectral curves. By contrast, the 1400–2400 nm region exhibited relatively flat reflectance patterns, with nearly overlapping curves for red and yellow cherry tomatoes. This suggests that this spectral range provides limited discriminatory information for differentiating tomato types (Figure 2).

Figure 2
A graph showing reflectance against wavelength in nanometers, comparing four colored lines: large red (red), medium red (blue), red cherry (green), and yellow cherry (yellow). The reflectance peaks around 800 nanometers and shows different patterns for each line.

Figure 2. Spectral reflectance curves of different tomato types (large red, medium red, red cherry, and yellow cherry) across the measured wavelength range.

3.3 GA-optimized linearly transformed reflectance spectral indices for prediction models

Using the ten datasets described in Section 2.2, GA was applied to construct the ltDSI, ltNDSI, and ltRSI indices. These indices were then used to develop SSC prediction models, and the calibration performance is summarized in Supplementary Table S2. For Dataset 1, the three spectral indices are given in Equations 1315), and their predictive performance is shown in Figure 3. As shown in Supplementary Table S2, only the indices derived from Dataset 8 selected different wavelength pairs, and their Rc2 values were relatively low. For Dataset 5, ltDSI, ltNDSI, and ltRSI all selected the same pair of wavelengths (805 and 845 nm). For the remaining eight datasets, all three indices consistently selected 805 and 835 nm. Overall, these wavelength pairs yielded high Rc2 values, indicating that 805 and 835 nm are sensitive wavelengths for SSC prediction.

Figure 3
Three scatter plots display the relationship between different indices (ltDSI, ltNDSI, ltRSI) and soluble solids content (SSC) in degrees Brix. The plots show data points for large red, medium red, red cherry, and yellow cherry markers. Each plot includes a trend line with corresponding equations and correlation coefficients: the left plot \( R_c^2 = 0.836 \), middle \( R_c^2 = 0.832 \), and right \( R_c^2 = 0.824 \). Each also includes RMSEC values.

Figure 3. Prediction performance of GA-optimized spectral indices (ltDSI, ltNDSI, and ltRSI) for tomato SSC using the calibration Dataset 1.

The validation performance for Datasets 1–5 is presented in Figure 4, where all data points cluster closely around the 1:1 line. The corresponding validation results for Datasets 6–10 are provided in Supplementary Figure S3, showing similar performance trends. Across all datasets, the models achieved Rp2 values of approximately 0.8, RMSEP values around 0.6°Brix, and MRE values near 10%. For a given dataset, the Rp2, RMSEP, and MRE of ltDSI, ltNDSI, and ltRSI were highly similar, except for Dataset 8. As illustrated in Figure 4; Supplementary Figure S3, none of the models exhibited sensitivity to tomato variety, demonstrating strong generalization capability.

Figure 4
Scatter plots showing predicted versus measured soluble solidscontent in °Brix for five datasets comparing different cherry types: large red, medium red,red cherry, and yellow cherry. Each dataset has plots for indices ltDSI, ltNDSI, and ltRSI.Each plot includes R-squared values, RMSEP, and MRE to indicate prediction accuracy.Different shapes and colors represent cherry types: squares for large red, circles formedium red, triangles for red cherry, and inverted triangles for yellow cherry. Dottedlines indicate the 1:1 reference line.

Figure 4. Relationships between GA-optimized predicted and measured SSC values for the validation Datasets 1–5, with the dashed line indicating the 1:1 reference.

ltDSI805,835=4.772R8054.773R835+5.410(13)
ltNDSI805,835=7.202R805+5.974R8350.3307.202R8055.974R8355.639(14)
ltRSI805,835=2.717R805+1.3783.808R835+1.979(15)

Where R805 and R835 denote the reflectance at 805 nm and 835 nm, respectively.

Figure 5 illustrates the evolutionary process of the GA in constructing spectral indices using Dataset 1. It shows the fitness of the best individual (defined as Rc2 in this study), the corresponding band combinations (λ1 and λ2), and the mean fitness of the population. When the GA was used to optimize ltDSI, ltNDSI, and ltRSI, different band combinations were selected at the initial stage. After approximately 40 generations, all three indices converged to the same band pair of 805 and 835 nm. It is worth noting that, during the evolutionary process, the GA also identified other band combinations that achieved high predictive performance, with Rc2 values exceeding 0.8. Examples of these optimized indices include ltDSI935,715, ltDSI805,905, and ltDSI805,895; ltNDSI935,705, ltNDSI845,805, ltNDSI805,895, and ltNDSI805,845; as well as ltRSI835,795, ltRSI715,935, and ltRSI805,895.

Figure 5
Three line graphs compare fitness metrics over generations forItDSI, ItNDSI, and ItRSI. Each graph shows best fitness (red), average fitness (blue), andindicators for best individuals l1 (pink) and l2 (green) across 50 generations. Axes arelabeled with fitness, generation, and band/nm.

Figure 5. Evolution of the best fitness (Rc2), average population fitness, and corresponding wavelength combinations (λ1, λ2) during GA optimization of spectral indices.

A virtual reflectance landscape was constructed using the spectral reflectance of large red tomatoes within the 555–1205 nm range to further illustrate spectral sensitivity (Figure 2). The side projection of this landscape corresponds to the spectral reflectance curve as a function of wavelength. Both the X- and Y-axes represent wavelengths from 555 to 1205 nm, and the Z-axis represents the reflectance values, defined as Z(x,y)=min(R(x),R(y)), where R(x) and R(y) are reflectance values at wavelengths x and y, respectively, and min() denotes the minimum operator. The number of generations was limited to 10 when constructing GA-optimized ltDSI, ltNDSI, and ltRSI models. At the end of evolution, band combinations with R2 ≥ 0.7 were plotted on the surface of the virtual landscape as green spheres (Figure 6). The optimal band pairs obtained at the end of GA evolution (Figure 5) were also mapped in the landscape as red spheres. As shown in Figure 6, the green spheres were concentrated in a small region of the landscape, with λ1 approximately 700–850 nm and λ2 820–1150 nm.

Figure 6
Three 3D surface plots displaying the correlation coefficient Rsquaredacross different wavelengths and reflectance. Graphs (A), (B), and (C) use coloredscales to indicate values ranging from 0.1 to 0.8. Green dots represent bandcombinations with R-squared greater than 0.7, while red dots denote optimal bandcombinations. Each plot includes axes labeled with λ1 and λ2 innanometers and reflectance.

Figure 6. Virtual spectral reflectance landscape illustrating the distribution of SSC-sensitive wavelength combinations (Rc2 ≥ 0.7) identified after 10 GA generations for (A) ltDSI, (B) ltNDSI, and (C) ltRSI.

3.4 Two-band conventional spectral index–based prediction models

Figure 7 shows the distribution of the R2 values for SSC prediction using conventional two-band spectral indices, where red areas indicate R2 > 0.6. The red area was much larger for DSIλ1,λ2 (Figure 7A) than for NDSIλ1,λ2 (Figure 7B), RSIλ1,λ2 (Figure 7C), and RSIλ2,λ1 (Figure 7D). For SSC prediction, when λ1 ranged from 700 to 900 nm and λ2 ranged from 830 to 1130 nm, multiple band combinations achieved satisfactory performance, with R2 values between 0.65 and 0.82. Additionally, when λ1 was 600–800 nm and λ2 was > 1200 nm, DSI achieved a relatively higher prediction accuracy, with R2 values between 0.50 and 0.62. The highest R2 values (>0.65) of NDSIλ1,λ2 were concentrated in narrow ranges, with λ1 ranging from 700 to 830 nm and λ2 ranging from 830 to 930 nm, yielding a maximum R2 of 0.73. The distribution of RSI was similar to that of DSI, with overlapping sensitive regions (Figure 7).

Figure 7
Four graphs labeled A, B, C, and D show two-dimensional correlation spectroscopic data with axes labeled λ1 and λ2 ranging from 400 to 2200 nanometers. Each graph displays a color map with a color bar ranging from 0.0 to 1.0, indicating R-squared values. Insets in each graph zoom in on a specific region, highlighted by arrows, showing variations in color patterns for detailed analysis.

Figure 7. Heat maps of Rc2 values between SSC and conventional two-band spectral indices across all wavelength combinations: (A) DSIλ1,λ2, (B) NDSIλ1,λ2, (C) RSIλ1,λ2, and (D) RSIλ2,λ1.

Based on Figure 7, the best-performing conventional two-band spectral indices were selected to construct SSC prediction models. For DSI, the indices DSI805,835, DSI805,895, and DSI795,845 were identified, whereas NDSI805,845, RSI805,845, and RSI845,805 were selected for the other index types. As shown in Figure 8, DSI805,835, DSI805,895, DSI795,845, NDSI805,845, and RSI805,845 exhibited negative correlations with SSC, whereas RSI845,805 showed a positive correlation. Among all conventional indices, DSI-based models achieved the highest calibration performance, with Rc2 values of 0.836, 0.816, and 0.808, while the remaining indices yielded only marginally higher values than 0.720. Notably, the distribution pattern of data points corresponding to large red tomatoes deviated from the overall trend in models based on NDSI and RSI, indicating a pronounced sensitivity of these models to fruit type. This type-dependent behavior suggests limited generalization when conventional indices are applied to heterogeneous datasets. The validation results further confirmed the superior performance of DSI-based models (Supplementary Figure S4), which exhibited lower RMSEP values (0.64–0.66°Brix) and a lower MRE (approximately 10.8%) compared with NDSI- and RSI-based models (RMSEP > 0.71°Brix and MRE > 11.0%).

Figure 8
Six scatter plots display relationships between spectral indices andsoluble solid content (SSC) in degrees Brix, with data subsets for large red, medium red,red cherry, and yellow cherry. Lines of best fit with equations, R-squared, and RMSECvalues are shown. Each plot uses different indices: DSI, NDSI, and RSI with specifiedwavelength ranges.

Figure 8. Prediction performance of selected conventional spectral indices for tomato SSC on Validation Dataset 1.

We noticed that two subplots in Figure 8, corresponding to SSC prediction using DSI805,835 and DSI805,895, exhibit partial visual similarity. This similarity primarily arises from the strong linear relationship between the reflectance at 835 nm and 895 nm (R2 = 0.985), which in turn leads to a strong linear correlation between the corresponding spectral indices DSI805, 835 and DSI805, 895 (R2 = 0.989), as illustrated in Supplementary Figure S5. In addition, the subplots based on NDSI805,845 and RSI805,845 show an even higher degree of global similarity in Figure 8. This can be attributed to the strong linear relationship between NDSI and RSI derived from the same pair of spectral bands in the present dataset. Let the reflectance at two wavelengths be denoted as R1 and R2. NDSI and RSI are defined as NDSI=(R1R2)/(R1+R2) and RSI=R1/R2, respectively. By rearranging these expressions, the following relationship can be obtained: 1/NDSI=1+2/(RSI1). When RSI is close to 1, |RSI1|2, and NDSI can be approximated as NDSI0.5RSI0.5, indicating a strong linear dependence between the two indices. As shown in Figure 8, RSI805,845 values in this study mainly range from 1.0 to 1.1 and are therefore very close to 1, which explains the strong linear relationship between NDSI and RSI observed in our results. This is further supported by the highly similar distribution patterns in Figures 7B, C. Similar behavior has also been reported by Tanaka et al. (2015), who observed a very high degree of similarity between NDSI- and RSI-based models for leaf area index estimation (see Figure 3 in their study).

3.5 Full-spectrum PLS models for SSC prediction

Using PLS, full-spectrum SSC prediction models were constructed based on the ten datasets described in Section 2.2, and their predictive performance is summarized in Table 2. These models were used as benchmark models for comparison with the GA-optimized linearly transformed reflectance spectral index–based prediction models. For all datasets, the optimal number of LVs was three. Across the ten datasets, the Rp2 values ranged from 0.539 to 0.767, with a mean of 0.668. The RMSEP varied between 0.720 and 1.012°Brix, with an average of 0.822°Brix. The MRE ranged from 9.7% to 14.3%, with a mean of 12.2%. Overall, the predictive accuracy of the full-spectrum PLS models was moderate and showed noticeable variation among datasets. These differences can be attributed to variations in sample composition between the calibration and validation sets, resulting from the random dataset partitioning strategy.

Table 2
www.frontiersin.org

Table 2. Prediction performance of full-spectrum partial least squares (PLS) models for SSC using different calibration and validation dataset partitions.

To further compare model performance under identical data conditions, paired t-tests were conducted using the Rp2, RMSEP, and MRE values of models developed from the same datasets (Table 2, Figure 4; Supplementary Figure S3). The results indicated that SSC prediction models based on the ltDSI significantly outperformed the corresponding PLS models, with p-values of 0.001, 3.30 ×10-4, and 0.004 for Rp2, RMSEP, and MRE, respectively (p < 0.01). Similarly, models based on ltNDSI (p = 4.65 × 10-4, 1.63 × 10-4, and 0.002) and ltRSI (p = 0.001, 4.52 × 10-4, and 0.004) also showed significantly superior predictive performance compared with the PLS models. These results demonstrate that the GA-optimized spectral index–based models provide more accurate SSC predictions than full-spectrum PLS models under the same dataset conditions.

4 Discussion

4.1 Spectral mechanisms underlying SSC prediction

The consistent selection of wavelength pairs around 805 and 835 nm across multiple datasets suggests that this spectral region is closely associated with tomato SSC. Fruit SSC is mainly determined by soluble sugars, including glucose, fructose, and sucrose (McGlone and Kawano, 1998; Zhou et al., 2023), with minor contributions from organic acids (Zhan et al., 2023; Cai et al., 2024). In tomatoes, glucose and fructose are the dominant sugars, whereas citric acid and malic acid are the primary organic acids (Ibáñez et al., 2019). Glucose and fructose are an aldose and a ketose, respectively, but they share very similar chemical structures (Qi and Tester, 2019). Each molecule contains multiple hydroxyl (O–H) groups and one carbonyl (C=O) group, which makes them spectrally difficult to distinguish in intact fruit. As a result, SSC-related spectral information in the near-infrared region mainly reflects the collective absorption and scattering effects of these functional groups rather than individual sugar species.

Previous studies have reported that absorption features near 800 nm are associated with combination and overtone vibrations of O–H bonds (Afonso et al., 2022). Given the abundance of hydroxyl groups in glucose and fructose, this spectral region is likely sensitive to changes in soluble sugar concentration. This provides a plausible optical explanation for the consistent selection of 805 and 835 nm by the GA in SSC prediction. Notably, the sensitive wavelengths identified in this study fall within the spectral range (780–980 nm) previously reported for tomato SSC prediction (Peiris et al., 1998) and overlap with the optimal wavelength range identified for pear SSC estimation (700–930 nm) (Choi et al., 2017). In contrast, these wavelengths differ substantially from those reported for kiwifruit SSC prediction, which are mainly concentrated between 900 and 930 nm (McGlone and Kawano, 1998). This discrepancy may be attributed to differences in SSC composition, as sucrose accounts for a higher proportion of total soluble sugars in kiwifruit compared with tomatoes. The stability of these wavelengths across datasets further indicates that they capture robust SSC-related spectral responses rather than cultivar-specific features.

In addition, other wavelength combinations identified during the GA evolutionary process (Figure 5) may be related to higher-order overtone vibrations of O–H and C–H bonds (Afonso et al., 2022; Zheng et al., 2024). This suggests that incorporating additional spectral bands may further improve SSC prediction performance, which warrants investigation in future studies.

4.2 Advantages of GA-optimized linearly transformed spectral indices

Conventional two-band spectral indices, such as DSI, NDSI, and RSI, are typically constructed using fixed mathematical formulations and predefined wavelength combinations (Kasimati et al., 2022; Wang Q. et al., 2022). Although these indices are simple and computationally efficient, their rigid structure limits adaptability when applied to heterogeneous datasets involving multiple cultivars or varying spectral baselines. As a result, their predictive performance and generalization ability are often sensitive to dataset composition, as illustrated in Figure 8.

In contrast, the GA-optimized linearly transformed spectral indices proposed in this study simultaneously optimize both wavelength selection and transformation coefficients. This integrated optimization enables the indices to adapt to variations in spectral magnitude and baseline shifts caused by cultivar differences and measurement conditions. Compared with approaches that first select wavelengths and subsequently construct prediction models (Ouyang et al., 2012; González-Flor et al., 2014), the proposed strategy offers clear advantages in terms of both prediction accuracy and model generalization. The reduced cultivar sensitivity observed across multiple independently generated datasets indicates that jointly optimizing index structure and parameters plays a critical role in enhancing model robustness and transferability. Rather than relying on fixed index formulations, the improved performance appears to stem from increased model flexibility, which allows the indices to accommodate biological and spectral variability more effectively. Similar challenges associated with cultivar sensitivity have been reported in previous studies. For example, Ibáñez et al. (2019) demonstrated that SSC prediction performance based on diffuse reflectance near-infrared spectra varied substantially across tomato cultivars. Likewise, Kusumiyati et al. (2022) reported difficulties in developing a universal SSC prediction model for melon. In the present study, SSC prediction models based on conventional two-band spectral indices also exhibited pronounced cultivar sensitivity. In contrast, the GA-optimized models based on linearly transformed reflectance indices showed neither cultivar-specific nor tomato-type-specific sensitivity. This contrast suggests that cultivar dependence in SSC prediction may be strongly influenced by the structural form of the prediction model rather than by crop type alone.

Full-spectrum PLS models are widely used as benchmark approaches for SSC prediction because they can exploit information from the entire spectral range and effectively address multicollinearity (Zheng et al., 2024). However, in the present study, these models achieved only moderate predictive performance and were consistently outperformed by the GA-optimized ltDSI, ltNDSI, and ltRSI models, as confirmed by the paired t-test results in Section 3.5. Although substantially higher prediction accuracy has been reported in some previous studies, such as the PLS-based model developed by He et al. (2005) with an RMSEP of 0.16°Brix, the discrepancy is likely attributable to differences in dataset heterogeneity. Specifically, He et al. (2005) focused on a single tomato cultivar, whereas the present study included four tomato types comprising thirteen cultivars, introducing greater biological and spectral variability and imposing a more stringent test of model generalization.

Compared with both conventional two-band indices and full-spectrum PLS models, the GA-optimized spectral indices achieved comparable or higher prediction accuracy while maintaining a much simpler model structure. More importantly, their predictive performance remained stable across ten independently generated datasets, providing strong evidence of robustness and generalization. Unlike PLS models, which require the full spectral range and careful optimization of latent variables, the proposed indices rely on only two wavelengths and simple arithmetic operations. This simplicity makes them particularly suitable for implementation in low-cost multispectral sensors and real-time applications, without sacrificing predictive reliability. Given that agricultural products often exhibit substantial biological variability due to differences in cultivar, growing season, production year, and geographical origin (Tian et al., 2023), future studies should further evaluate the effects of seasonal, annual, and geographical variability to enhance model robustness and transferability.

4.3 Field-based spectral acquisition: advantages and challenges

As shown in Figure 1, spectral data in this study were acquired under outdoor conditions rather than in a fully controlled laboratory environment, such as measurements conducted in a darkroom (Huang et al., 2021; Zheng et al., 2024). Compared with laboratory-based measurements, field-based spectral acquisition inevitably introduces additional sources of variability, including fluctuations in ambient illumination and sensor–target geometry effects (Nicolai et al., 2007). These factors may increase spectral noise and complicate subsequent data processing and modeling.

Despite these challenges, field-based spectral acquisition also offers important advantages that help explain differences among reported SSC prediction performances in the literature. First, outdoor measurements more closely reflect realistic sensing conditions encountered in practical agricultural applications, thereby improving the ecological validity of the developed models. Second, the increased spectral variability inherent to field conditions provides a more stringent test of model robustness and generalization ability. Models that perform well under such conditions are therefore more likely to maintain reliable performance when deployed in real-world scenarios. In addition, outdoor measurements can avoid certain instabilities associated with artificial illumination sources commonly used in laboratory settings, such as lamp aging and power fluctuations (Cen and He, 2007).

It should also be noted that different spectral acquisition modes may contribute to discrepancies among studies. Transmittance-based measurements are often limited by the penetration depth of near-infrared radiation, particularly when applied to thick-peeled fruits, which restricts their ability to probe internal quality attributes (Todorova et al., 2024). In contrast, the present study employed reflectance-based spectral measurements under field conditions, which are less constrained by fruit thickness and are more suitable for non-destructive assessment of internal parameters in intact tomato fruits. This difference in measurement geometry provides a plausible explanation for variations in SSC prediction performance reported across studies using different acquisition strategies.

5 Conclusions

This study demonstrates that hyperspectral reflectance combined with GA optimization provides a robust and transferable framework for the non-destructive estimation of tomato SSC across multiple fruit types and cultivars. The linearly transformed reflectance–based spectral indices (ltDSI, ltNDSI, and ltRSI) consistently outperformed traditional spectral indices and full-spectrum PLS models and exhibited markedly reduced sensitivity to cultivar differences, underscoring the importance of index design in capturing physiologically relevant spectral information associated with SSC. These results indicate that carefully optimized spectral indices can effectively extract biologically meaningful signals related to sugar accumulation while remaining resilient to cultivar-induced spectral variability. Although this study focused on fully ripe fruits under specific measurement conditions, further validation across different developmental stages and field environments is needed to support broader physiological interpretation and practical application. Overall, this work provides a scalable, non-destructive approach for assessing fruit internal quality and contributes to the development of field-applicable phenotyping tools in horticultural crop research.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Author contributions

NL: Conceptualization, Data curation, Formal Analysis, Funding acquisition, Methodology, Project administration, Supervision, Validation, Writing – original draft, Writing – review & editing. YF: Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. YZZ: Investigation, Methodology, Validation, Visualization, Writing – original draft. YC: Investigation, Validation, Visualization, Writing – original draft. YZ: Funding acquisition, Validation, Visualization, Writing – original draft, Writing – review & editing. XZ: Validation, Visualization, Writing – original draft. JG: Investigation, Validation, Writing – original draft. XL: Investigation, Validation, Writing – original draft.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This research was funded by Provincial–Ministerial Co-Construction Collaborative Innovation Center Project for Regional Modern Agriculture and Environmental Protection (HSXT3025), and Earmarked Fund for Sichuan Innovation Team Program of CARS (No. SCCXTD-2024-22).

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2026.1729375/full#supplementary-material

References

Afonso, A. M., Antunes, M. D., Cruz, S., Cavaco, A. M., and Guerra, R. (2022). Non-destructive follow-up of ‘Jintao’ kiwifruit ripening through VIS-NIR spectroscopy – individual vs. average calibration model’s predictions. Postharvest Biol. Technol. 188, 111895. doi: 10.1016/j.postharvbio.2022.111895

Crossref Full Text | Google Scholar

Ali, M. M., Anwar, R., Rehman, R. N. U., Ejaz, S., Ali, S., Yousef, A. F., et al. (2022). Sugar and acid profile of loquat (Eriobotrya japonica Lindl.), enzymes assay and expression profiling of their metabolism-related genes as influenced by exogenously applied boron. Front. Plant Sci. 13. doi: 10.3389/fpls.2022.1039360

PubMed Abstract | Crossref Full Text | Google Scholar

Almoujahed, M. B., Whetton, R. L., and Mouazen, A. M. (2025). Data fusion of visible near-infrared and mid-infrared spectroscopy combined with feature selection and machine learning for rapid discrimination of fusarium head blight infection in wheat kernel and flour. Infrared Phys. Technol. 151, 106072. doi: 10.1016/j.infrared.2025.106072

Crossref Full Text | Google Scholar

Cai, L., Zhang, Y., Cai, Z., Shi, R., Li, S., and Li, J. (2024). Detection of soluble solids content in tomatoes using full transmission Vis-NIR spectroscopy and combinatorial algorithms. Front. Plant Sci. 15. doi: 10.3389/fpls.2024.1500819

PubMed Abstract | Crossref Full Text | Google Scholar

Cen, H. and He, Y. (2007). Theory and application of near infrared reflectance spectroscopy in determination of food quality. Trends Food Sci. Technol. 18, 72–83. doi: 10.1016/j.tifs.2006.09.003

Crossref Full Text | Google Scholar

Chea, C., Saengprachatanarug, K., Posom, J., Wongphati, M., and Taira, E. (2020). Sugar yield parameters and fiber prediction in sugarcane fields using a multispectral camera mounted on a small unmanned aerial system (UAS). Sugar Tech. 22, 605–621. doi: 10.1007/s12355-020-00802-5

Crossref Full Text | Google Scholar

Choi, J., Chen, P., Lee, B., Yim, S., Kim, M., Bae, Y., et al. (2017). Portable, non-destructive tester integrating VIS/NIR reflectance spectroscopy for the detection of sugar content in Asian pears. Sci. Hortic. 220, 147–153. doi: 10.1016/j.scienta.2017.03.050

Crossref Full Text | Google Scholar

Elsayed, S., Gala, H., Abd El-baki, M. S., Maher, M., Elbeltagi, A., Salem, A., et al. (2025). Hyperspectral technology and machine learning models to estimate the fruit quality parameters of mango and strawberry crops. PLoS One 20, e0313397. doi: 10.1371/journal.pone.0313397

PubMed Abstract | Crossref Full Text | Google Scholar

Friedman, M. (2013). Anticarcinogenic, cardioprotective, and other health benefits of tomato compounds lycopene, α-tomatine, and tomatidine in pure form and in fresh and processed tomatoes. J. Agric. Food. Chem. 61, 9534–9550. doi: 10.1021/jf402654e

PubMed Abstract | Crossref Full Text | Google Scholar

Friedman, M. (2015). Chemistry and anticarcinogenic mechanisms of glycoalkaloids produced by eggplants, potatoes, and tomatoes. J. Agric. Food. Chem. 63, 3323–3337. doi: 10.1021/acs.jafc.5b00818

PubMed Abstract | Crossref Full Text | Google Scholar

González-Flor, C., Serrano, L., Gorchs, G., and Pons, J. M. (2014). Assessment of grape yield and composition using reflectance-based indices in rainfed vineyards. Agron. J. 106, 1309–1316. doi: 10.2134/agronj13.0422

Crossref Full Text | Google Scholar

He, Y., Zhang, Y., Pereira, A. G., Gómez, A. H., and Wang, J. (2005). Nondestructive determination of tomato fruit quality characteristics using VIS/NIR spectroscopy technique. Int. J. Inf. Technol. 11, 97–108.

Google Scholar

Huang, Y., Dong, W., Chen, Y., Wang, X., Luo, W., Zhan, B., et al. (2021). Online detection of soluble solids content and maturity of tomatoes using Vis/NIR full transmittance spectra. Chemometrics Intell. Lab. Syst. 210, 104243. doi: 10.1016/j.chemolab.2021.104243

Crossref Full Text | Google Scholar

Ibáñez, G., Cebolla-Cornejo, J., Martí, R., Roselló, S., and Valcárcel, M. (2019). Non-destructive determination of taste-related compounds in tomato using NIR spectra. J. Food Eng. 263, 237–242. doi: 10.1016/j.jfoodeng.2019.07.004

Crossref Full Text | Google Scholar

Jaywant, S. A., Singh, H., and Arif, K. M. (2022). Sensors and instruments for brix measurement: a review. Sensors 22, 2290. doi: 10.3390/s22062290

PubMed Abstract | Crossref Full Text | Google Scholar

Kasimati, A., Espejo-García, B., Darra, N., and Fountas, S. (2022). Predicting grape sugar content under quality attributes using normalized difference vegetation index data and automated machine learning. Sensors 22, 3249. doi: 10.3390/s22093249

PubMed Abstract | Crossref Full Text | Google Scholar

Khuriyati, N. and Matsuoka, T. (2004). Near infrared transmittance method for nondestructive determination of soluble solids content in growing tomato fruits. Environ. Control Biol. 42, 217–223. doi: 10.2525/ecb1963.42.217

Crossref Full Text | Google Scholar

Kim, S., Hong, S., Kim, E., Lee, C., and Kim, G. (2023). Application of ensemble neural-network method to integrated sugar content prediction model for citrus fruit using Vis/NIR spectroscopy. J. Food Eng. 338, 111254. doi: 10.1016/j.jfoodeng.2022.111254

Crossref Full Text | Google Scholar

Kusumiyati, K., Hadiwijaya, Y., Sutari, W., and Munawar, A. A. (2022). Global model for in-field monitoring of sugar content and color of melon pulp with comparative regression approach. AIMS Agric. Food 7, 312–315. doi: 10.3934/agrfood.2022020

Crossref Full Text | Google Scholar

Kwon, R. S., Lee, G. Y., Lee, S., and Song, J. (2024). Antimicrobial properties of tomato juice and peptides against typhoidal Salmonella. Microbiol. Spectr. 12, e03102–e03123. doi: 10.1128/spectrum.03102-23

PubMed Abstract | Crossref Full Text | Google Scholar

Li, H., Zhu, L., Li, N., Liu, Z., Wang, L., Chitrakar, B., et al. (2024). NIR spectroscopy for quality assessment and shelf-life prediction of kiwifruit. Postharvest Biol. Technol. 218, 113201. doi: 10.1016/j.postharvbio.2024.113201

Crossref Full Text | Google Scholar

Li, S., Li, J., Wang, Q., Shi, R., Yang, X., and Zhang, Q. (2024). Determination of soluble solids content of multiple varieties of tomatoes by full transmission visible-near infrared spectroscopy. Front. Plant Sci. 15. doi: 10.3389/fpls.2024.1324753

PubMed Abstract | Crossref Full Text | Google Scholar

Lu, R. (2001). Predicting firmness and sugar content of sweet cherries using near–infrared diffuse reflectance spectroscopy. Trans. ASAE 44, 1265. doi: 10.13031/2013.6421

Crossref Full Text | Google Scholar

Lu, R., Guyer, D. E., and Beaudry, R. M. (2000). Determination of firmness and sugar content of apples using near-infrared diffuse reflectance. J. Texture Stud. 31, 615–630. doi: 10.1111/j.1745-4603.2000.tb01024.x

Crossref Full Text | Google Scholar

McGlone, V. A. and Kawano, S. (1998). Firmness, dry-matter and soluble-solids assessment of postharvest kiwifruit by NIR spectroscopy. Postharvest Biol. Technol. 13, 131–141. doi: 10.1016/S0925-5214(98)00007-6

Crossref Full Text | Google Scholar

Nicolai, B. M., Beullens, K., Bobelyn, E., Peirs, A., Saeys, W., Theron, K. I., et al. (2007). Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: A review. Postharvest Biol. Technol. 46, 99–118. doi: 10.1016/j.postharvbio.2007.06.024

Crossref Full Text | Google Scholar

Ouyang, A., Xie, X., Zhou, Y., and Liu, Y. (2012). Partial least squares regression variable screening studies on apple soluble solids nir spectral detection. Spectrosc. Spectr. Anal. 32, 2680–2684. doi: 10.3964/j.issn.1000-0593(2012)10-2680-05

PubMed Abstract | Crossref Full Text | Google Scholar

Peiris, K., Dull, G. G., Leffler, R. G., and Kays, S. J. (1998). Near-infrared (NIR) spectrometric technique for nondestructive determination of soluble solids content in processing tomatoes. J. Am. Soc Hortic. Sci. 123, 1089–1093. doi: 10.21273/JASHS.123.6.1089

Crossref Full Text | Google Scholar

Qi, X. and Tester, R. F. (2019). Fructose, galactose and glucose – In health and disease. Clin. Nutr. ESPEN 33, 18–28. doi: 10.1016/j.clnesp.2019.07.004

PubMed Abstract | Crossref Full Text | Google Scholar

Radzevičius, A., Viškelis, J., Karklelienė, R., Juškevičienė, D., and Viškelis, P. (2016). Determination of tomato quality attributes using near infrared spectroscopy and reference analysis. Zemdirbyste 103, 91–98. doi: 10.13080/z-a.2016.103.012

Crossref Full Text | Google Scholar

Roger, J. M. and Bellon-Maurel, V. (2000). Using genetic algorithms to select wavelengths in near-infrared spectra: application to sugar content prediction in cherries. Appl. Spectrosc. 54, 1313–1320. doi: 10.1366/0003702001951237

Crossref Full Text | Google Scholar

Rongtong, B., Suwonsichon, T., Ritthiruangdej, P., and Kasemsumran, S. (2018). Determination of water activity, total soluble solids and moisture, sucrose, glucose and fructose contents in osmotically dehydrated papaya using near-infrared spectroscopy. Agric. Nat. Resour. 52, 557–564. doi: 10.1016/j.anres.2018.11.023

Crossref Full Text | Google Scholar

Shahi, B., Dahal, S., Mishra, A., Kumar, S. B. V., and Kumar, C. P. (2016). A review over genetic algorithm and application of wireless network systems. Proc. Comput. Sci. 78, 431–438. doi: 10.1016/j.procs.2016.02.085

Crossref Full Text | Google Scholar

Sun, D., Cruz, J., Alcalà, M., Romero Del Castillo, R., Sans, S., and Casals, J. (2021). Near infrared spectroscopy determination of chemical and sensory properties in tomato. J. Near Infrared Spectrosc. 29, 289–300. doi: 10.1177/09670335211018759

Crossref Full Text | Google Scholar

Tanaka, S., Kawamura, K., Maki, M., Muramoto, Y., Yoshida, K., and Akiyama, T. (2015). Spectral index for quantifying leaf area index of winter wheat by field hyperspectral measurements: A case study in Gifu prefecture, central Japan. Remote Sens. 7, 5329–5346. doi: 10.3390/rs70505329

Crossref Full Text | Google Scholar

Tang, C., He, H., Li, E., and Li, H. (2018). Multispectral imaging for predicting sugar content of ‘Fuji’ apples. Opt. Laser Technol. 106, 280–285. doi: 10.1016/j.optlastec.2018.04.017

Crossref Full Text | Google Scholar

Tian, S., Liu, W., and Xu, H. (2023). Improving the prediction performance of soluble solids content (SSC) in kiwifruit by means of near-infrared spectroscopy using slope/bias correction and calibration updating. Food Res. Int. 170, 112988. doi: 10.1016/j.foodres.2023.112988

PubMed Abstract | Crossref Full Text | Google Scholar

Tian, H., Ying, Y., Lu, H., Fu, X., and Yu, H. (2007). Measurement of soluble solids content in watermelon by Vis/NIR diffuse transmittance technique. J. Zhejiang Univ. Sci. B 8, 105–110. doi: 10.1631/jzus.2007.B0105

PubMed Abstract | Crossref Full Text | Google Scholar

Todorova, M., Veleva, P., Atanassova, S., Georgieva, T., Vasilev, M., and Zlatev, Z. (2024). Assessment of tomato quality through Near-Infrared spectroscopy—advantages, limitations, and integration with multivariate analysis techniques. Eng. Proc. 70, 34. doi: 10.3390/engproc2024070034

Crossref Full Text | Google Scholar

Wang, Q., Che, Y., Shao, K., Zhu, J., Wang, R., Sui, Y., et al. (2022). Estimation of sugar content in sugar beet root based on UAV multi-sensor data. Comput. Electron. Agric. 203, 107433. doi: 10.1016/j.compag.2022.107433

Crossref Full Text | Google Scholar

Wang, F., Zhao, C., Yang, H., Jiang, H., Li, L., and Yang, G. (2022). Non-destructive and in-site estimation of apple quality and maturity by hyperspectral imaging. Comput. Electron. Agric. 195, 106843. doi: 10.1016/j.compag.2022.106843

Crossref Full Text | Google Scholar

Xia, Y., Fan, S., Li, J., Tian, X., Huang, W., and Chen, L. (2020). Optimization and comparison of models for prediction of soluble solids content in apple by online Vis/NIR transmission coupled with diameter correction method. Chemometrics Intell. Lab. Syst. 201, 104017. doi: 10.1016/j.chemolab.2020.104017

Crossref Full Text | Google Scholar

Xu, S., Wang, H., Liang, X., and Lu, H. (2024). Research progress on methods for improving the stability of non-destructive testing of agricultural product quality. Foods 13, 3917. doi: 10.3390/foods13233917

PubMed Abstract | Crossref Full Text | Google Scholar

Zhan, B., Li, P., Li, M., Luo, W., and Zhang, H. (2023). Detection of soluble solids content (SSC) in pears using near-Infrared spectroscopy combined with LASSO–GWF–PLS model. Agriculture 13, 1491. doi: 10.3390/agriculture13081491

Crossref Full Text | Google Scholar

Zheng, Y., Liu, P., Zheng, Y., and Xie, L. (2024). Improving SSC detection accuracy of cherry tomatoes by feature synergy and complementary spectral bands combination. Postharvest Biol. Technol. 213, 112922. doi: 10.1016/j.postharvbio.2024.112922

Crossref Full Text | Google Scholar

Zhou, J., Yang, S., Ma, Y., Liu, Z., Tu, H., Wang, H., et al. (2023). Soluble sugar and organic acid composition and flavor evaluation of Chinese cherry fruits. Food Chemistry: X 20, 100953. doi: 10.1016/j.fochx.2023.100953

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: genetic algorithm, hyperspectral spectroscopy, linearly transformed indices, spectral indices, spectral reflectance landscape, tomato SSC

Citation: Liu N, Fang Y, Zhao Y, Chen Y, Zheng Y, Zha X, Guo J and Li X (2026) Novel estimation of tomato soluble solids content using linearly transformed reflectance-based spectral indices. Front. Plant Sci. 17:1729375. doi: 10.3389/fpls.2026.1729375

Received: 21 October 2025; Accepted: 22 January 2026; Revised: 30 December 2025;
Published: 05 February 2026.

Edited by:

Liujun Xiao, Nanjing Agricultural University, China

Reviewed by:

Syed Tahir Ata-Ul-Karim, Aarhus University, Denmark
Qibing Zhu, Jiangnan University, China

Copyright © 2026 Liu, Fang, Zhao, Chen, Zheng, Zha, Guo and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Naisen Liu, Ym9vbXppcEAxNjMuY29t

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.