Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol., 05 February 2026

Sec. Breast Cancer

Volume 16 - 2026 | https://doi.org/10.3389/fonc.2026.1739346

Machine learning model based on dual-layer detector spectral CT radiomics features for differentiating luminal and non-luminal breast cancer

Zhijing SongZhijing Song1Yikun MaYikun Ma2Zhiyang DouZhiyang Dou1Bo Shi*Bo Shi1*
  • 1Anhui Key Laboratory of Digital Medicine and Intelligent Health, School of Medical Imaging, Bengbu Medical University, Bengbu, China
  • 2Department of Radiology, Nanjing Medical University Affiliated Cancer Hospital, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, Nanjing, China

Objective: This study aims to explore the value of a machine learning (ML) model based on dual-layer detector spectral CT (DLCT) radiomic features in predicting Luminal versus non-Luminal breast cancer (BC).

Methods: A retrospective analysis was conducted on 128 pathologically confirmed BC patients from the Department of Breast Surgery, Jiangsu Cancer Hospital. DLCT chest enhancement images were analyzed, with regions of interest delineated to extract radiomic features. Optimal features were selected through univariate analysis, correlation analysis, and LASSO algorithm, followed by ML model construction.

Results: A total of 1,037 radiomic features were extracted, from which 13 optimal features were selected. Combined with clinical parameters (age, body mass index (BMI), and menopausal status), seven ML models were constructed. Among them, the Gaussian Naive Bayes (GNB) model demonstrated the best performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.778 (95% CI: 0.582–0.974), accuracy of 0.821, sensitivity of 0.833, and specificity of 0.778, outperforming the other six models.

Conclusions: The GNB model demonstrated relatively superior and stable predictive performance in internal testing, suggesting that DLCT radiomics may offer a potential auxiliary tool for distinguishing between Luminal and non-Luminal BC. However, further validation through large-scale multicenter studies is required.

1 Introduction

Breast cancer (BC) is one of the most common malignancies in women worldwide, particularly affecting women aged 30 to 45 years (1). It is the second most deadly malignancy affecting women (2), representing a major health concern for female populations. According to the St. Gallen International Breast Cancer Expert Consensus (3), BC can be classified into four subtypes based on four immunohistochemical markers (4). Among these, Luminal A and Luminal B types are categorized as Luminal-type BC, accounting for approximately 60% to 70% of all cases (5), and typically respond well to endocrine therapy (6, 7). Meanwhile, HER2-OE and triple-negative types are classified as non-Luminal BC, often requiring targeted therapy or more intensive treatment regimens, and are associated with poorer prognosis (810). Since these two types show significant differences in treatment selection and prognosis (11), accurate preoperative differentiation of subtypes is crucial for developing individualized treatment plans. Currently, BC molecular subtyping primarily relies on histopathological detection methods such as immunohistochemistry and in situ hybridization, but these techniques are invasive and prone to sampling bias (12).

In recent years, radiomics has emerged as a research focus for preoperative BC subtyping due to its non-invasive nature and strong reproducibility. Numerous studies have attempted to develop predictive models based on imaging modalities such as magnetic resonance imaging (MRI) and ultrasound (US) (1316). However, MRI examinations are costly, time-consuming, and susceptible to motion artifacts (17). More importantly, its multi-parameter scanning characteristics affect the stability of radiomic features, thereby compromising model generalizability (18). US demonstrates limited capability in discriminating small lesions (19), while its operator-dependent image acquisition leads to poor reproducibility and biological consistency of extracted radiomic features (20). Therefore, there is an urgent need for more efficient and precise radiomic approaches to optimize preoperative BC subtyping. As an emerging imaging technology, dual-layer detector spectral CT (DLCT) offers high resolution and provides multiple spectral images, thereby expanding radiomics research possibilities (2123). Preliminary studies have demonstrated DLCT radiomics’ effectiveness in predicting malignant tumors (24, 25), yet its application in BC molecular subtyping remains unexplored.

Therefore, this study aims to innovatively integrate DLCT radiomic features with machine learning (ML) algorithms to construct a non-invasive discrimination model for Luminal versus non-Luminal BC, exploring the potential of DLCT radiomics for precise BC subtyping.

2 Methods

2.1 Patients

This retrospective study analyzed BC patients treated at the Department of Breast Surgery, Jiangsu Cancer Hospital from October 2021 to July 2024. Inclusion criteria comprised: (1) Confirmed as BC through histopathological examination; (2) Preoperative contrast-enhanced DLCT chest examination. Exclusion criteria eliminated patients with: (1) Prior surgical/radiation/chemotherapy treatments (n=4); (2) >1 week interval between imaging and pathological confirmation (n=3); (3) Suboptimal image quality hampers ROI delineation (n=3); (4) Incomplete clinicopathological records (n=5). After applying these selection criteria, 128 patients qualified for final analysis.

2.2 Clinical and histopathological analysis

We analyzed clinical and pathological data from all patients, including age, BMI, menopausal status, and immunohistochemistry (IHC) results. Based on the 2013 St. Gallen International Breast Cancer Expert Consensus (3), patients were classified into four molecular subtypes using four IHC markers: estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), and Ki-67 proliferation index. The classification criteria were: (1) Luminal A: ER(+) or PR(+), HER2(-), Ki-67 <20%; (2) Luminal B: ①:minal,cationm ER(+) or PR(+), HER2(-), Ki-67 ≥i-67 ②i-67-),cationm ER(+) or PR(+), HER2(+), any Ki-67; (3) HER2-OE: ER(-), PR(-), HER2(+), any Ki-67; (4) TNBC: ER(-), PR(-), HER2(-), any Ki-67. For subsequent analysis, Luminal A and B subtypes were grouped as Luminal-type BC, while HER2-OE and TNBC were classified as non-Luminal BC.

2.3 DLCT image acquisition

All patients underwent preoperative contrast-enhanced DLCT chest examinations using the IQon spectral CT scanner (Philips Healthcare, Best, The Netherlands). Patients were positioned supine with a scanning range extending from the lung apex to the costophrenic angle level. For contrast-enhanced imaging, a non-ionic iodinated contrast agent (ioversol, 350 mg iodine/mL, Hengrui Pharmaceuticals, Lianyungang, China) was administered intravenously at an injection rate of 2.5-3.0 mL/s, followed by a 20 mL saline flush at 2.5 mL/s. Post-injection scanning was initiated after a 50-second delay. The scanning parameters were as follows: tube voltage 120 kVp, automatic tube current modulation, detector configuration 64 × 0.625 mm, pitch 0.900, rotation time 0.50 s, matrix size 512 × 512, field of view 372 mm, scan slice thickness 5 mm, and reconstruction slice thickness 1 mm.

2.4 Image segmentation and radiomics feature extraction

This study performed radiomics feature extraction based on 55 keV images. Previous research has demonstrated that images at this keV level exhibit good image quality and an optimal contrast-to-noise ratio (26, 27). Two experienced radiologists, blinded to clinical and pathological findings, used 3D-Slicer software to manually delineate regions of interest (ROI) slice-by-slice along lesion contours on 55 keV monochromatic images. Any discrepancies in ROI delineation were resolved through consensus. After ROI delineation, images were resampled to 1×1×1 mm voxels, and feature extraction was performed using the PyRadiomics package. Extracted features included: (1) Original features: 14 shape features, 18 first-order statistical features, and 75 texture features (including 24 from gray-level co-occurrence matrix (GLCM), 16 from gray-level size zone matrix (GLSZM), 16 from gray-level run length matrix (GLRLM), 5 from neighboring gray-tone difference matrix (NGTDM), and 14 from gray-level dependence matrix (GLDM); (2) Transformed features obtained after image filtering using wavelet transform (combining high-pass and low-pass filters in three directions) and Laplacian of Gaussian filters (3, 5).

2.5 Radiomics feature selection and model construction

The dataset was randomly divided into training and test sets at a ratio of 7:3. Z-score normalization was applied exclusively to the training set data, and the Synthetic Minority Oversampling Technique (SMOTE) was employed to achieve a 1:1 ratio between Luminal and non-Luminal samples. The working principle of this method involves randomly generating a certain number of new samples along the line segments between existing minority class samples. This approach not only addresses the class imbalance issue but also effectively reduces the risk of model overfitting, thereby enhancing the model’s generalization performance to some extent (28, 29). In the training set, univariate analysis was performed on the data, followed by correlation testing to remove redundant features, retaining only one feature when r > 0.7. The Lasso algorithm (with 5-fold cross-validation) was then applied to eliminate features with coefficients of zero. This method can effectively handle multicollinearity among features and offers higher computational efficiency compared to iterative wrapper methods, such as Recursive Feature Elimination. Moreover, the results of the Lasso algorithm (with 5-fold cross-validation) represent a subset of the original feature space (30). In contrast to Principal Component Analysis, this approach facilitates the direct presentation of the radiomics features that drive the model’s decision-making. Finally, the selected features were ranked by importance based on the model coefficients to identify stable and key features. The test set remained completely isolated throughout the entire feature selection process and was used solely for final performance evaluation. After balancing the data in the training set using the SMOTE method, a 10-fold cross-validation approach was employed, dynamically partitioning the data into training and validation sets. The model was trained on the training and validation sets and ultimately evaluated on the test set.

Seven ML algorithms were used to construct radiomics models: logistic regression (LR), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), adaptive boosting (AdaBoost), random forest (RF), Gaussian naive Bayes (GNB), and support vector machine (SVM). Model performance was evaluated using the receiver operating characteristic (ROC) curve, the area under the ROC curve (AUC) and its 95% confidence interval (95% CI), accuracy, sensitivity, and specificity. The models were assessed through 10-fold cross-validation and further validated using test set data. For the calculation of classification performance metrics such as sensitivity and specificity, the predicted probabilities from the models were binarized using the default threshold of 0.5: probabilities ≥ 0.5 were assigned as positive, and probabilities < 0.5 were assigned as negative. For the best-performing model, confusion matrix plots and learning curve plots were generated, and visual interpretation was conducted using Shapley additive explanations (SHAP) explainability analysis method.The above modeling process was based on Python programming language (version 3.11.4). All models were trained using the default parameters of their standard implementation libraries. The XGBoost model was implemented using xgboost=2.0.1, the LightGBM model was implemented using lightgbm=3.2.1, and the other models were implemented using scikit-learn=1.1.3.

2.6 Statistical analysis

Statistical analyses were performed using SPSS Statistics 27.0 (IBM Corp, Chicago, Illinois, United States of America). The Shapiro-Wilk test was used to assess data normality. Normally distributed continuous data were expressed as mean ± standard deviation (SD) and compared using independent samples t-tests. Non-normally distributed continuous data were expressed as [M50 (P25, P75)] and compared using Mann-Whitney U tests. Categorical data were expressed as [n(%)] and compared using chi-square tests. A p value < 0.05 was considered statistically significant.

3 Results

3.1 Participant characteristics

This study ultimately included 128 BC patients (all female), comprising 33 cases (25.8%) of non-Luminal type with mean age 56.4 ± 10.4 years and mean BMI 24.2 ± 3.3 kg/m², and 95 cases (74.2%) of Luminal type with mean age 53.9 ± 11.7 years and mean BMI 24.6 ± 3.1 kg/m². There were 40 premenopausal patients (31.3%) and 88 postmenopausal patients (68.7%). Table 1 presents the patients’ baseline data and clinical characteristics. No statistically significant differences were observed between the two groups regarding age, BMI, or menopausal status.

Table 1
www.frontiersin.org

Table 1. Comparison of clinical data of all patients.

3.2 Radiomics feature extraction and selection

Based on 55 keV contrast-enhanced DLCT chest images from 128 patients, a total of 1,037 radiomic features were extracted from the region of interest delineated within each patient’s lesion. According to a 7:3 ratio, the dataset was randomly divided into a training set and a test set, with 89 samples in the training set and 39 samples in the test set. After applying SMOTE, the training set comprised 130 samples. In each fold of the 10-fold cross-validation, 117 cases were used for training, and 13 cases were used for validation. Through univariate analysis and correlation analysis, 39 features were initially selected, which were further reduced to 18 features using the Lasso algorithm. The optimal regularization parameter was 0.026. Figure 1 displays the names and coefficient plot of the selected radiomics features. Figure 2 displays the importance ranking of these 18 features based on model coefficient analysis. Ultimately, the top 13 features with low correlation but high discriminative power were selected for subsequent modeling analysis.

Figure 1
Bar chart titled “Coefficients in the Lasso Model” showing various features and their coefficients ranging from negative 0.10 to positive 0.10. Features like “waveletHHH_glcm_Correlation” and “waveletLHL_glcm_Imc1” have prominent coefficients, while others like “waveletHHH_firstorder_Minimum” have minimal influence.

Figure 1. The names and coefficient plot of the radiomics features selected by the Lasso algorithm (with 5-fold cross-validation).

Figure 2
Bar chart displaying feature importance with coefficients for various variables. The top three features are waveletHHH_glszm_GrayLevelNonUniformityNormalized, waveletHHH_firstorder_Skewness, and waveletHLH_glcm_Imc2, showing the highest importance. The chart lists features vertically with importance values on the horizontal axis.

Figure 2. The importance ranking chart of 18 features. The features are arranged from top to bottom based on their importance, and the longer the bar length, the more significant the feature.

3.3 Diagnostic performance of the seven models

Using 13 radiomic features and 3 clinical features, we constructed prediction models with seven ML algorithms. Figure 3 shows the ROC curves of these seven models in both the training and validation sets. Figures 4, 5 present the performance metrics of the seven models in the training and validation sets using 10-fold cross-validation. Table 2 lists the performance metrics of the seven models in the test set. As can be seen from Figures 4, 5; Table 2, although the XGBoost, LightGBM, AdaBoost and RF models showed higher AUC values in the training and validation sets, their performance in the test set was unsatisfactory, suggesting possible overfitting. The GNB model achieved AUC values of 0.900 and 0.869 in the training and validation sets, respectively. In the test set, the model attained an AUC of 0.778 (95% CI: 0.582–0.974), accuracy of 0.821, sensitivity of 0.833, and specificity of 0.778, outperforming the other six models. From the sensitivity and specificity of the GNB model, it can be observed that both values are comparable, showing no severe performance bias caused by the original sample imbalance. The results indicate that the GNB model had the best predictive performance. Figure 6 displays the ROC curves of the GNB model across the training, validation, and test sets. Figure 7 shows the confusion matrix of the GNB model. Figure 8. Learning curves of the GNB model on the training set and validation set.

Figure 3
Two ROC curve charts comparing different models. The left chart shows training performance with AUC scores: XGBoost, LightGBM, RandomForest, and AdaBoost all score 1.000; logistic regression scores 0.874; GNB scores 0.900; SVM scores 0.750. The right chart shows validation performance: XGBoost scores 0.917; logistic regression 0.857; LightGBM 0.917; RandomForest 0.905; AdaBoost 0.933; GNB 0.869; SVM 0.745. Each model is represented by a different colored line.

Figure 3. The ROC curves of 7 models on the training set and validation set.

Figure 4
Bar chart comparing machine learning models (LR, XGBoost, LightGBM, RF, Adaboost, GNB, SVM) on accuracy, sensitivity, and specificity. Accuracy is in light blue, sensitivity in dark blue, and specificity in yellow. Most models show high values close to 1.0 across these metrics.

Figure 4. Performance metrics of the 7 models from 10-fold cross-validation in the training set. The light blue bar represents the accuracy of the model, the dark blue bar represents the sensitivity of the model, and the yellow bar represents the specificity of the model.

Figure 5
Bar chart showing performance metrics for various models: Logistic Regression (LR), XGBoost, LightGBM, Random Forest (RF), Adaboost, Gaussian Naive Bayes (GNB), and Support Vector Machine (SVM). Metrics include Accuracy, Sensitivity, and Specificity. Accuracy and Sensitivity are displayed in shades of blue, while Specificity is in yellow.

Figure 5. Performance metrics of the 7 models from 10-fold cross-validation in the validation set. The light blue bar represents the accuracy of the model, the dark blue bar represents the sensitivity of the model, and the yellow bar represents the specificity of the model.

Table 2
www.frontiersin.org

Table 2. Performance indicators of 7 models in the test set.

Figure 6
Three ROC curve charts display model performance across different datasets. Chart (a) shows the training set with an AUC of 0.900. Chart (b) depicts the validation set with an AUC of 0.869. Chart (c) illustrates the test set with an AUC of 0.778. The ROC curves are plotted with sensitivity on the Y-axis and 1-specificity on the X-axis, alongside a diagonal reference line.

Figure 6. The ROC of the GNB model. (a–c) represent the training set, validation set and test set respectively.

Figure 7
Two confusion matrices labeled (a) and (b). Matrix (a) shows 58 true positives and true negatives, and 7 false positives and false negatives. Matrix (b) shows 25 true positives and 7 true negatives, with 5 false positives and 2 false negatives. Both matrices use a blue color scale to indicate values.

Figure 7. Confusion matrix plots of the GNB model for the training set and test set. (a) corresponds to the training set, and (b) corresponds to the test set.

Figure 8
Learning curve graph titled “GaussianNB Learning Curve” showing ROC AUC scores against training samples. The red dashed line represents the training set, starting near 1.0 and slightly decreasing. The blue dashed line represents the validation set, starting around 0.65 and gradually increasing.

Figure 8. Learning curves of the GNB model on the training set and validation set.

3.4 Model interpretation

Visual analysis was performed on the best-performing GNB model using SHAP method. Figure 9a displays the contribution values of all 16 features incorporated in the GNB model. Figure 9b presents the SHAP summary plot for the Gaussian Naive Bayes (GNB) model, with features ranked vertically by their mean absolute SHAP values in descending order. Higher-positioned features demonstrate stronger predictive importance for discriminating between Luminal and non-Luminal subtypes.

Figure 9
Bar chart and scatter plot showing SHAP values for features impacting a model's output. On the left, bar graph ranks features like “waveletHHH_firstorder_Skewness” by mean SHAP value magnitude. On the right, a scatter plot depicts individual SHAP value impacts, color-coded by feature value from blue (low) to pink (high). A vertical color bar indicates feature value gradient.

Figure 9. SHAP bar plot and summary plot of the GNB model. (a) SHAP bar plot of the GNB model. The vertical axis lists various features, sorted by their average impact on the model’s predictions with the most important features positioned at the top. The horizontal axis represents the absolute value of the average SHAP value for each feature’s contribution to the model’s predictions, reflecting the feature’s importance. (b) SHAP summary plot of the GNB model. The y-axis displays the features in the model, ranked by their contribution to the model’s predictions, with the most important features positioned at the top. The color of each data point represents feature values—red indicates higher values, while blue indicates lower values. The x-axis represents the SHAP value for each data point, where positive values contribute to predicting a positive outcome, and negative values contribute to predicting a negative outcome.

4 Discussion

This study is the first to propose a ML model based on DLCT chest enhancement imaging radiomic features for distinguishing Luminal from non-Luminal BC. The results demonstrate that the GNB model combining 13 radiomic features and 3 clinical features exhibits good predictive performance (AUC = 0.778). These findings provide valuable reference for early diagnosis and precision treatment of Luminal and non-Luminal BC, offering new imaging evidence for future subtyping research.

Recent years have seen increasing research focus on developing non-invasive methods for early differentiation between Luminal and non-Luminal BC. Studies by Xu et al. (31), Feng et al. (32), and Wang et al. (33) developed models using MRI radiomic features and functional parameters, with AUC values of 0.830, 0.879, and 0.830 respectively. Liu et al. (34) developed a model using US radiomic features with an AUC of 0.752. In terms of model performance, MRI-based models outperformed US-based ones. Umutlu et al. (35) created a model using PET/MRI radiomic features with an AUC of 0.950, though without further test set validation and with higher PET/MRI costs. Liu et al. (36) developed a predictive model based on DLCT quantitative parameters with an AUC of 0.754, performing less well than our DLCT radiomics-based model. Our GNB model outperformed US radiomics-based and DLCT quantitative parameter-based models, though showed lower performance than MRI radiomics, functional parameter, and PET/MRI radiomics-based models. However, compared to MRI and PET/MRI examinations, DLCT offers higher examination efficiency and lower cost. Additionally, unlike the prone position typically used in MRI, DLCT’s supine position matches surgical positioning, minimizing potential image and lesion deviations caused by posture changes, while simultaneously evaluating skin, chest wall, internal mammary lymph nodes, bilateral axillary lymph nodes, and supraclavicular lymph nodes (37).

Among the seven ML models constructed based on the same DLCT radiomics features, Figures 4, 5; Table 2 reveal that the XGBoost, LightGBM, AdaBoost, and RF models exhibit outstanding predictive performance on both the training and validation sets (AUC = 1.000 on the training set, and AUC > 0.900 on the validation set). However, their performance significantly declines on the independent test set (AUC < 0.600). This marked performance discrepancy strongly suggests the presence of overfitting in these models. We attribute the occurrence of overfitting primarily to the high-dimensional, small-sample challenge faced in this study. A total of 1,037 features were extracted from each patient’s images, while the initial training set comprised only 89 samples, expanding to 130 after SMOTE balancing. In such a scenario where the feature dimension substantially exceeds the sample size, models are highly prone to capturing noise and spurious correlations in the training data, leading to diminished generalization capability. To mitigate overfitting, multiple safeguards were incorporated into the study pipeline, such as SMOTE, LASSO, cross-validation, and the strict establishment of an independent test set. Although overfitting was observed in some complex models, the GNB model we ultimately selected demonstrated the smallest performance gap across the training, validation, and test sets, with AUC values of 0.900, 0.869, and 0.778, respectively. Although the GNB model assumes strong independence among features, multiple previous radiomics modeling studies have reported that the GNB model demonstrated the best performance in their research (38, 39). We hypothesize that in high-dimensional, small-sample scenarios, the strong independence assumption of the GNB model can, to some extent, prevent the model from fitting noise and complex feature interactions in high-dimensional data, thereby enhancing its generalization capability. In contrast, more flexible models such as RF and XGBoost are more prone to overfitting. The robust generalization performance of the GNB model is further corroborated by the specific classification patterns revealed in its confusion matrix. As shown in Figure 7, the model correctly identified 25 cases of Luminal BC (true positives) and 7 cases of non-Luminal BC (true negatives) in the test set (n=39), achieving an overall accuracy of 84.6%. The model’s errors exhibit a clear asymmetry, with false negatives being the primary source of error, while false positives are relatively fewer. Future feature engineering efforts could focus on these misclassified cases.

Radiomics converts medical images into quantitative, objective features to non-invasively explore tumor heterogeneity and characteristics (16). Existing research has demonstrated radiomics’ potential for non-invasive BC subtyping, though most studies utilized MRI or US. Feng et al. (32) combined clinical factors with intratumoral subregion MRI radiomic features to develop a nomogram model (AUC = 0.830), while Wu et al. (40) created a nomogram based on ultrasound radiomic features (AUC = 0.767). Previous CT-based investigations such as the work of Wang et al. (41), who created a radiomic model distinguishing Luminal from non-Luminal BC (AUC = 0.757) using CT features. To our knowledge, no prior studies have utilized DLCT radiomic features to differentiate between Luminal type and non-Luminal type BC. Our DLCT radiomics-based GNB model achieved test set AUC, accuracy, sensitivity and specificity of 0.778 (95% CI: 0.582–0.974), 0.821, 0.833 and 0.778 respectively, demonstrating good performance. Among the 13 radiomic features ultimately selected for modeling, three were shape features and first-order statistics from the original images: ‘original_shape_MajorAxisLength’ representing the longest axis, ‘original_shape_Elongation’ indicating the ratio of the shortest to longest axis, and ‘original_firstorder_90Percentile’ denoting the 90th percentile value. The other ten features, accounting for the largest proportion, were all wavelet-transformed first-order statistical and texture features. Bian et al. (42) found wavelet features played important roles in their multiparametric MRI-based model predicting HER2-low BC; Yang et al. (43) demonstrated strong correlations between wavelet features and neoadjuvant chemotherapy response; Zhou et al. (44) showed wavelet features’ good performance in evaluating neoadjuvant chemoradiotherapy for BC patients. These findings suggest wavelet features may have predictive value in BC, consistent with our results. From Figure 9a, it can be observed that the top five most contributing radiomics features in the GNB model are waveletHHH_firstorder_Skewness, waveletHLL_firstorder_Kurtosis, waveletHHH_glszm_GrayLevelNonUniformityNormalized, waveletHLH_glcm_Imc2, and waveletHHL_firstorder_Median. The wavelet-HHH-firstorder-Skewness measures the asymmetry of the CT value distribution in the tumor region. Figure 9b shows that the lower values of this feature are concentrated in the right high SHAP value region. The lower the skewness value, the more the model tends to classify the tumor as the Luminal type. In imaging, lower skewness indicates a more symmetric CT value distribution. We speculate that this may reflect a relatively homogeneous microenvironment in this type of tumor, lacking significant microcalcifications or micro-necrotic areas. Conversely, high skewness may be associated with heterogeneous components in non-Luminal types. Wavelet-HLL-firstorder-Kurtosis describes the kurtosis of the CT value distribution. As observed in Figure 9b, the lower the kurtosis value, the more the model tends to classify the tumor as Luminal type. Low kurtosis indicates a broad, dispersed distribution of voxel CT values. In contrast, high kurtosis typically signifies that voxel values are highly concentrated within a narrow range, which may correspond to highly homogeneous areas on imaging, such as large necrotic or liquefied regions or abnormally uniform areas of marked enhancement. These patterns may be more closely associated with certain features of non-Luminal types (45). Wavelet-HHH-glszm-GrayLevelNonUniformityNormalized quantifies the spatial dominance of different density regions in high-frequency texture details. Figure 9b shows that the higher its feature value, the greater its contribution to predicting the Luminal type. This suggests that, at the high-frequency texture scale, the microstructure of Luminal BC may be spatially dominated by a few highly homogeneous tissue components. On imaging, this may manifest as dominant regions composed of large, relatively uniform glandular parenchyma or stromal components. Wavelet-HLH-glcm-Imc2 evaluates the complexity and regularity of pixel gray-level co-occurrence patterns. Figure 9b shows that the lower its feature value, the greater its contribution to predicting the Luminal type. We hypothesize that the relatively regular structure of Luminal BC results in a low Imc2 value at the algorithmic level. Conversely, a high Imc2 may correspond to extreme homogenization of large-scale structures, such as extensive necrosis or abnormally uniform areas of enhancement, which are common features of non-Luminal BC. It is noteworthy that this study found that higher values of the wavelet-HHL-firstorder_Median feature (indicative of higher enhancement) positively contribute to predicting the Luminal type. This appears to contradict some conventional imaging views that non-Luminal types have richer blood supply (46). We propose that this discrepancy may arise from the following reasons: this study evaluates the median CT value in a specific frequency subband (HHL) after wavelet transformation, rather than the average enhancement of the entire tumor on the original images. The enhancement patterns observed at this specific texture scale may differ biologically from overall enhancement. Radiomics features capture spatial distribution and texture patterns. A high Median value may more strongly reflect the concentration and consistency of enhancement distribution at the HHL scale, rather than merely the peak enhancement intensity. Luminal BC may exhibit more uniform and consistent overall enhancement, whereas non-Luminal BC may display heterogeneous, focal marked enhancement (47).

The GNB model based on DLCT radiomics developed in this study provides proof-of-concept for a rapid, objective, and non-invasive tool for preoperative BC subtyping. We envision the following clinical integration pathway: the model is designed to serve as an auxiliary diagnostic tool, integrated into the post-processing stage of BC imaging examinations. After patients undergo chest contrast-enhanced DLCT scanning, radiologists or technicians can invoke this model on a PACS workstation or a dedicated radiomics analysis platform. By inputting the 55 keV images and performing delineation, the model will automatically extract radiomic features from the ROI and execute the prediction algorithm. Within seconds, the system will generate a structured report containing predicted probability values, confidence intervals, and visualizations of key discriminative features, among other information. The results can serve as an auxiliary reference for radiologists, reducing subjective variability in diagnosis. Moreover, prior to the availability of pathological results, it can provide clinicians with preliminary insights into molecular subtype tendencies, facilitating earlier planning for subsequent treatment strategy discussions.

This study has several limitations. First, the relatively limited sample size may affect the statistical power and stability of the model. Second, all data were derived from a single center, where patient population characteristics, imaging acquisition equipment, and protocols are relatively uniform. This may limit the generalizability of the model developed in this study to other institutions, different equipment, or diverse patient populations. Following this study, we plan to design a prospective study to conduct real-time validation with consecutively enrolled patients at our institution in the future, and actively seek multi-center collaborations. Third, despite the use of the SMOTE algorithm for data balancing, the imbalance in the number of Luminal-type and non-Luminal-type patients may have affected model performance. Future studies should continue to collect cases with an emphasis on balancing patient ratios, and include comparative experiments with and without SMOTE as a key component of the research. Fourth, this study employed manual ROI segmentation. Although strict standardized protocols were followed to ensure quality, the absence of a multi-observer consistency test may somewhat affect the reproducibility of the method. In subsequent research, we will prioritize the adoption of semi-automatic segmentation algorithms or organize multi-center observer agreement studies to further enhance reproducibility. Fifth, this study did not systematically evaluate the impact of variations in imaging acquisition and reconstruction parameters on the stability of radiomic features. Future research must incorporate rigorous testing of feature stability to identify robust features that are insensitive to technical parameters.

5 Conclusion

Based on DLCT radiomic features, this study preliminarily explored and constructed seven ML models. Among these, the GNB model demonstrated relatively superior and stable predictive performance in internal testing. The findings suggest that DLCT radiomics may offer a potential auxiliary tool for distinguishing between Luminal and non-Luminal BC, thereby potentially aiding in early diagnosis and preliminary discussions on treatment strategies. This study provides preliminary evidence and hypotheses for this field, but its clinical translation prospects urgently require further validation and advancement through large-scale, prospective, multicenter studies in the future.

Data availability statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. Requests to access the datasets should be directed to Bo Shi, c2hpYm9AYmJtdS5lZHUuY24=.

Ethics statement

The studies involving humans were approved by the Ethics Committee of Jiangsu Cancer Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

ZS: Writing – original draft, Validation, Formal Analysis, Data curation, Software. YM: Data curation, Writing – review & editing. ZD: Writing – original draft, Data curation. BS: Conceptualization, Methodology, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This research was supported by the “512” Outstanding Talents Fostering Project of Bengbu Medical University (grant number BY51201312), the Postgraduate Research Innovation Project of Bengbu Medical University (grant number Byycxz24016).

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Johnson RH, Anders CK, Litton JK, Ruddy KJ, and Bleyer A. Breast cancer in adolescents and young adults. Pediatr Blood Cancer. (2018) 65:e27397. doi: 10.1002/pbc.27397

PubMed Abstract | Crossref Full Text | Google Scholar

2. Siegel RL, Giaquinto AN, and Jemal A. Cancer statistics, 2024. CA Cancer J Clin. (2024) 74:12–49. doi: 10.3322/caac.21820

PubMed Abstract | Crossref Full Text | Google Scholar

3. Goldhirsch A, Winer EP, Coates AS, Gelber RD, Piccart-Gebhart M, Thürlimann B, et al. Personalizing the treatment of women with early breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2013. Ann Oncol. (2013) 24:2206–23. doi: 10.1093/annonc/mdt303

PubMed Abstract | Crossref Full Text | Google Scholar

4. Harbeck N and Gnant M. Breast cancer. Lancet. (2017) 389:1134–50. doi: 10.1016/S0140-6736(16)31891-8

PubMed Abstract | Crossref Full Text | Google Scholar

5. Ignatiadis M and Sotiriou C. Luminal breast cancer: from biology to treatment. Nat Rev Clin Oncol. (2013) 10:494–506. doi: 10.1038/nrclinonc.2013.124

PubMed Abstract | Crossref Full Text | Google Scholar

6. Loibl S, Poortmans P, Morrow M, Denkert C, and Curigliano G. Breast cancer. Lancet. (2021) 397:1750–69. doi: 10.1016/S0140-6736(20)32381-3

PubMed Abstract | Crossref Full Text | Google Scholar

7. Liedtke C, Rody A, Gluz O, Baumann K, Beyer D, Kohls EB, et al. The prognostic impact of age in different molecular subtypes of breast cancer. Breast Cancer Res Treat. (2015) 152:667–73. doi: 10.1007/s10549-015-3491-3

PubMed Abstract | Crossref Full Text | Google Scholar

8. Johansson ALV, Trewin CB, Hjerkind KV, Ellingjord-Dale M, Johannesen TB, Ursin G, et al. Breast cancer-specific survival by clinical subtype after 7 years follow-up of young and elderly women in a nationwide cohort. Int J Cancer. (2019) 144:1251–61. doi: 10.1002/ijc.31950

PubMed Abstract | Crossref Full Text | Google Scholar

9. Howlader N, Cronin KA, Kurian AW, and Andridge R. Differences in breast cancer survival by molecular subtypes in the United States. Cancer Epidemiol Biomarkers Prev. (2018) 27:619–26. doi: 10.1158/1055-9965.EPI-17-0627

PubMed Abstract | Crossref Full Text | Google Scholar

10. Sopik V, Sun P, and Narod SA. The prognostic effect of estrogen receptor status differs for younger versus older breast cancer patients. Breast Cancer Res Treat. (2017) 165:391–402. doi: 10.1007/s10549-017-4333-2

PubMed Abstract | Crossref Full Text | Google Scholar

11. Ma M, Liu R, Wen C, Xu W, Xu Z, Wang S, et al. Predicting the molecular subtype of breast cancer and identifying interpretable imaging features using machine learning algorithms. Eur Radiol. (2022) 32:1652–62. doi: 10.1007/s00330-021-08271-4

PubMed Abstract | Crossref Full Text | Google Scholar

12. Qi YJ, Su G. H, You C, Zhang X, Xiao Y, Jiang YZ, et al. Radiomics in breast cancer: Current advances and future directions. Cell Rep Med. (2024) 5:101719. doi: 10.1016/j.xcrm.2024.101719

PubMed Abstract | Crossref Full Text | Google Scholar

13. Leithner D, Horvat JV, Marino MA, Bernard-Davila B, Jochelson MS, Ochoa-Albiztegui RE, et al. Radiomic signatures with contrast-enhanced magnetic resonance imaging for the assessment of breast cancer receptor status and molecular subtypes: initial results. Breast Cancer Res. (2019) 21:106. doi: 10.1186/s13058-019-1187-z

PubMed Abstract | Crossref Full Text | Google Scholar

14. Ozkan EE, Sengul SS, Erdogan M, Gurdal O, and Eroglu HE. 18F-fluorodeoxyglucose PET/computed tomography in locoregional staging and assessment of biological and clinical aggressiveness of breast cancer subtypes. Nucl Med Commun. (2019) 40:1043–50. doi: 10.1097/MNM.0000000000001073

PubMed Abstract | Crossref Full Text | Google Scholar

15. Saha A, Harowicz MR, Grimm LJ, Kim CE, Ghate SV, Walsh R, et al. A machine learning approach to radiogenomics of breast cancer: a study of 922 subjects and 529 DCE-MRI features. Br J Cancer. (2018) 119:508–16. doi: 10.1038/s41416-018-0185-8

PubMed Abstract | Crossref Full Text | Google Scholar

16. Gillies RJ, Kinahan PE, and Hricak H. Radiomics: images are more than pictures, they are data. Radiology. (2016) 278:563–77. doi: 10.1148/radiol.2015151169

PubMed Abstract | Crossref Full Text | Google Scholar

17. Zhou Y, Li H, Liu J, Kong Z, Huang T, Ahn E, et al. Explicit abnormality extraction for unsupervised motion artifact reduction in magnetic resonance imaging. IEEE J BioMed Health Inform. (2025) 29:3853–63. doi: 10.1109/JBHI.2024.3444771

PubMed Abstract | Crossref Full Text | Google Scholar

18. Ford J, Dogan N, Young L, and Yang F. Quantitative radiomics: impact of pulse sequence parameter selection on MRI-based textural features of the brain. Contrast Media Mol Imaging. (2018) 2018:1729071. doi: 10.1155/2018/1729071

PubMed Abstract | Crossref Full Text | Google Scholar

19. Christensen-Jeffries K, Couture O, Dayton PA, Eldar Y. C, Hynynen K, Kiessling F, et al. Super-resolution ultrasound imaging. Ultrasound Med Biol. (2020) 46:865–91. doi: 10.1016/j.ultrasmedbio.2019.11.013

PubMed Abstract | Crossref Full Text | Google Scholar

20. Gu J and Jiang T. Ultrasound radiomics in personalized breast management: Current status and future prospects. Front Oncol. (2022) 12:963612. doi: 10.3389/fonc.2022.963612

PubMed Abstract | Crossref Full Text | Google Scholar

21. Johnson TR, Krauss B, Sedlmair M, Grasruck M, Bruder H, Morhard D, et al. Material differentiation by dual energy CT: initial experience. Eur Radiol. (2007) 17:1510–7. doi: 10.1007/s00330-006-0517-6

PubMed Abstract | Crossref Full Text | Google Scholar

22. Fredenberg E. Spectral and dual-energy X-ray imaging for medical applications. Nucl Instrum Methods Phys Res A. (2018) 878:74–87. doi: 10.1016/j.nima.2017.07.044

Crossref Full Text | Google Scholar

23. Li J, Fang M, Wang R, Dong D, Tian J, Liang P, et al. Diagnostic accuracy of dual-energy CT-based nomograms to predict lymph node metastasis in gastric cancer. Eur Radiol. (2018) 28:5241–9. doi: 10.1007/s00330-018-5483-2

PubMed Abstract | Crossref Full Text | Google Scholar

24. Yang L, Shi G, Zhou T, Li Y, and Li Y. Quantification of the iodine content of perigastric adipose tissue by dual-energy CT: A novel method for preoperative diagnosis of T4-stage gastric cancer. PloS One. (2015) 10:e0136871. doi: 10.1371/journal.pone.0136871

PubMed Abstract | Crossref Full Text | Google Scholar

25. Zhang Z, Zhao X, Gu J, et al. Spectral CT radiomics features of the tumor and perigastric adipose tissue can predict lymph node metastasis in gastric cancer. Abdom Radiol (NY). (2025) 50:3435–46. doi: 10.1007/s00261-025-04807-0

PubMed Abstract | Crossref Full Text | Google Scholar

26. Alagic Z, Valls Duran C, Suzuki C, Halldorsson K, Svensson-Marcial A, Saeter R, et al. Photon-counting detector computed tomography: iodine density versus virtual monoenergetic imaging of pancreatic ductal adenocarcinoma. Abdom Radiol (NY). (2025) 50:1720–30. doi: 10.1007/s00261-024-04605-0

PubMed Abstract | Crossref Full Text | Google Scholar

27. Han YE, Park BJ, Sung DJ, Kim MJ, Han NY, Sim KC, et al. Dual-layer spectral CT of pancreas ductal adenocarcinoma: can virtual monoenergetic images of the portal venous phase be an alternative to the pancreatic-phase scan? J Belg Soc Radiol. (2022) 106:83. doi: 10.5334/jbsr.2798

PubMed Abstract | Crossref Full Text | Google Scholar

28. Wang L, Wu X, Tian R, Ma H, Jiang Z, Zhao W, et al. MRI-based pre-Radiomics and delta-Radiomics models accurately predict the post-treatment response of rectal adenocarcinoma to neoadjuvant chemoradiotherapy. Front Oncol. (2023) 13:1133008. doi: 10.3389/fonc.2023.1133008

PubMed Abstract | Crossref Full Text | Google Scholar

29. Li W, Li Y, Liu X, Wang L, Chen W, Qian X, et al. Machine learning-based radiomics for predicting BRAF-V600E mutations in ameloblastoma. Front Immunol. (2023) 14:1180908. doi: 10.3389/fimmu.2023.1180908

PubMed Abstract | Crossref Full Text | Google Scholar

30. Huang Q, Jiang Z, and Han M. Temporal trends and machine learning prediction of depressive symptoms among Chinese middle-aged and elderly individuals: a national cohort study. BMC Public Health. (2025) 25:3117. doi: 10.1186/s12889-025-24103-2

PubMed Abstract | Crossref Full Text | Google Scholar

31. Xu A, Chu X, Zhang S, Zheng J, Shi D, Lv S, et al. Prediction breast molecular typing of invasive ductal carcinoma based on dynamic contrast enhancement magnetic resonance imaging radiomics characteristics: A feasibility study. Front Oncol. (2022) 12:799232. doi: 10.3389/fonc.2022.799232

PubMed Abstract | Crossref Full Text | Google Scholar

32. Feng S and Yin J. Dynamic contrast-enhanced magnetic resonance imaging radiomics analysis based on intratumoral subregions for predicting luminal and nonluminal breast cancer. Quant Imaging Med Surg. (2023) 13:6735–49. doi: 10.21037/qims-22-1073

PubMed Abstract | Crossref Full Text | Google Scholar

33. Wang W, Zhang X, Zhu L, Chen Y, Dou W, Zhao F, et al. Prediction of prognostic factors and genotypes in patients with breast cancer using multiple mathematical models of MR diffusion imaging. Front Oncol. (2022) 12:825264. doi: 10.3389/fonc.2022.825264

PubMed Abstract | Crossref Full Text | Google Scholar

34. Liu H, Xia H, Yin X, Qin A, Zhang W, Feng S, et al. Study on the differentiation of infiltrating breast cancer molecular subtypes based on ultrasound radiomics. Clin Breast Cancer. (2025) 25:e450–60. doi: 10.1016/j.clbc.2025.01.005

PubMed Abstract | Crossref Full Text | Google Scholar

35. Umutlu L, Kirchner J, Bruckmann NM, Morawitz J, Antoch G, Ingenwerth M, et al. Multiparametric integrated 18F-FDG PET/MRI-based radiomics for breast cancer phenotyping and tumor decoding. Cancers (Basel). (2021) 13:2928. doi: 10.3390/cancers13122928

PubMed Abstract | Crossref Full Text | Google Scholar

36. Liu J, Wang L, Ai Z, Jian L, Yang M, Liu S, et al. A prediction model based on dual-layer spectral detector computed tomography for distinguishing nonluminal from luminal invasive breast cancer. Quant Imaging Med Surg. (2024) 14:8672–85. doi: 10.21037/qims-24-598

PubMed Abstract | Crossref Full Text | Google Scholar

37. Perrone A, Lo Mele L, Sassi S, Marini M, Testaverde L, Izzo L, et al. MDCT of the breast. AJR Am J Roentgenol. (2008) 190:1644–51. doi: 10.2214/AJR.07.3145

PubMed Abstract | Crossref Full Text | Google Scholar

38. Zhang J, Hao L, Xu Q, and Gao F. Radiomics and clinical characters based gaussian naive bayes (GNB) model for preoperative differentiation of pulmonary pure invasive mucinous adenocarcinoma from mixed mucinous adenocarcinoma. Technol Cancer Res Treat. (2024) 23:15330338241258415. doi: 10.1177/15330338241258415

PubMed Abstract | Crossref Full Text | Google Scholar

39. Wang Y, Bai G, Huang M, and Chen W. Machine learning model based on enhanced CT radiomics for the preoperative prediction of lymphovascular invasion in esophageal squamous cell carcinoma. Front Oncol. (2024) 14:1308317. doi: 10.3389/fonc.2024.1308317

PubMed Abstract | Crossref Full Text | Google Scholar

40. Wu J, Ge L, Jin Y, Wang Y, Hu L, Xu D, et al. Development and validation of an ultrasound-based radiomics nomogram for predicting the luminal from non-luminal type in patients with breast carcinoma. Front Oncol. (2022) 12:993466. doi: 10.3389/fonc.2022.993466

PubMed Abstract | Crossref Full Text | Google Scholar

41. Wang F, Wang D, Xu Y, Jiang H, Liu Y, Zhang J, et al. Potential of the non-contrast-enhanced chest CT radiomics to distinguish molecular subtypes of breast cancer: A retrospective study. Front Oncol. (2022) 12:848726. doi: 10.3389/fonc.2022.848726

PubMed Abstract | Crossref Full Text | Google Scholar

42. Bian X, Du S, Yue Z, Gao S, Zhao R, Huang G, et al. Potential antihuman epidermal growth factor receptor 2 target therapy beneficiaries: the role of MRI-based radiomics in distinguishing human epidermal growth factor receptor 2-low status of breast cancer. J Magn Reson Imaging. (2023) 58:1603–14. doi: 10.1002/jmri.28628

PubMed Abstract | Crossref Full Text | Google Scholar

43. Yang M, Liu H, Dai Q, Yao L, Zhang S, Wang Z, et al. Treatment response prediction using ultrasound-based pre-, post-early, and delta radiomics in neoadjuvant chemotherapy in breast cancer. Front Oncol. (2022) 12:748008. doi: 10.3389/fonc.2022.748008

PubMed Abstract | Crossref Full Text | Google Scholar

44. Zhou J, Lu J, Gao C, Zeng J, Zhou C, Lai X, et al. Predicting the response to neoadjuvant chemotherapy for breast cancer: wavelet transforming radiomics in MRI. BMC Cancer. (2020) 20:100. doi: 10.1186/s12885-020-6523-2

PubMed Abstract | Crossref Full Text | Google Scholar

45. Jeh SK, Kim SH, Kim HS, Kang BJ, Jeong SH, Yim HW, et al. Correlation of the apparent diffusion coefficient value and dynamic magnetic resonance imaging findings with prognostic factors in invasive ductal carcinoma. J Magn Reson Imaging. (2011) 33:102–9. doi: 10.1002/jmri.22400

PubMed Abstract | Crossref Full Text | Google Scholar

46. Liu L, Mei N, Yin B, and Peng W. Correlation of DCE-MRI perfusion parameters and molecular biology of breast infiltrating ductal carcinoma. Front Oncol. (2021) 11:561735. doi: 10.3389/fonc.2021.561735

PubMed Abstract | Crossref Full Text | Google Scholar

47. Gu WQ, Cai SM, Liu WD, Zhang Q, Shi Y, and Du LJ. Combined molybdenum target X-ray and magnetic resonance imaging examinations improve breast cancer diagnostic efficacy. World J Clin Cases. (2022) 10:485–91. doi: 10.12998/wjcc.v10.i2.485

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: breast cancer, dual-layer detector spectral CT, machine learning, molecular subtype, radiomics

Citation: Song Z, Ma Y, Dou Z and Shi B (2026) Machine learning model based on dual-layer detector spectral CT radiomics features for differentiating luminal and non-luminal breast cancer. Front. Oncol. 16:1739346. doi: 10.3389/fonc.2026.1739346

Received: 04 November 2025; Accepted: 20 January 2026; Revised: 11 January 2026;
Published: 05 February 2026.

Edited by:

Renu Popli, Chitkara University, India

Reviewed by:

Nazmul Ahasan Maruf, King Abdulaziz University, Saudi Arabia
Chetna Sharma, Chitkara University, India

Copyright © 2026 Song, Ma, Dou and Shi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Bo Shi, c2hpYm9AYmJtdS5lZHUuY24=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.