ORIGINAL RESEARCH article

Front. Physiol., 01 May 2026

Sec. Computational Physiology and Medicine

Volume 17 - 2026 | https://doi.org/10.3389/fphys.2026.1820868

A 2.5D multichannel deep learning model using contrast-enhanced ultrasound for predicting malignancy in breast nodules: a two-center study

  • 1. Department of Ultrasound, Southern University of Science and Technology Hospital, Shenzhen, China

  • 2. Department of Ultrasound Medicine, Shenzhen Hospital of Southern Medical University, Shenzhen, China

  • 3. Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China

  • 4. Department of Ultrasound, Shenzhen Baoan AirSea Hospital, Shenzhen, China

Abstract

Objective:

To evaluate a novel multichannel deep learning (DL) model using contrast-enhanced ultrasound (CEUS) data with multiple regions of interest (ROIs) and time-intensity curve (TIC)-derived key frames for predicting breast nodule malignancy. Clinical features were integrated into a combined model for robust, generalizable breast lesion classification. The model was further evaluated as an AI-assisted decision support tool through direct comparison with BI-RADS classification by senior radiologists.

Methods:

This retrospective two-center study enrolled 141 patients with breast nodules: 89 from Institution 1 (June 2016–October 2017; training cohort, n=62; internal validation, n=27) and 52 from Institution 2 (November 2022–November 2024; external validation). BI-RADS categories were extracted from original radiology reports and binarized at ≥4B for malignancy prediction. Tumors were segmented on B-mode and CEUS images to define intratumoral ROIs, tumor bounding boxes, and peritumoral expansions (2 mm and 5 mm). TIC phases (initial, ascending, peak, descending, wash-out) were stacked into multichannel 2.5-dimensional (2.5D) inputs. DenseNet201 models, pretrained on ImageNet, were trained for 2D and 2.5D DL across ROI types. Outputs from the clinical model and optimal intratumoral plus 2-mm peritumoral ROI models were fused via logistic regression. Performance was evaluated using area under the receiver operating characteristic curve (AUC), Hosmer–Lemeshow calibration, decision curve analysis (DCA).and DeLong test for comparison with BI-RADS.

Results:

Among 2.5D models, the multichannel variant with intratumoral plus 2-mm peritumoral ROI showed highest external validation performance. The combined model, constructed by fusing the output of the optimal MultiChannel_2.5D_DL architecture (intratumoral + 2-mm peritumoral ROI) with the 2D_DL and clinical models via logistic regression, outperformed individual models externally (AUC 0.949 [95% CI: 0.888, 1.000] vs. clinical AUC 0.821 [95% CI: 0.671, 0.970], p=0.04; vs. 2D AUC 0.789 [95% CI: 0.660, 0.918], p=0.01; vs. 2.5D AUC 0.824 [95% CI: 0.677, 0.972], p=0.03). In direct comparison in the external validation cohort, the combined model demonstrated diagnostic performance comparable to that of senior radiologists (AUC 0.949 [95% CI: 0.888, 1.000] vs. 0.897 [95% CI: 0.808, 0.986], p=0.15).

Conclusion:

This combined model, integrating the optimal MultiChannel_2.5D_DL output with 2D_DL and clinical features, offers promising accuracy and generalizability as a decision support tool for CEUS-based breast nodule malignancy prediction, potentially assisting radiologists in reducing interobserver variability and unnecessary biopsies.

1 Introduction

Breast cancer remains a leading cause of morbidity and mortality among women worldwide, underscoring the need for accurate and non-invasive diagnostic tools to differentiate benign from malignant breast nodules (Giaquinto et al., 2022). Ultrasonography, particularly contrast-enhanced ultrasound (CEUS), has emerged as a valuable imaging modality due to its ability to provide real-time visualization of vascular perfusion patterns, which are often distinct between benign and malignant lesions (Wang et al., 2022). However, the diagnostic accuracy of CEUS relies heavily on subjective interpretation, which is limited by interobserver variability and the complexity of lesion characteristics (Chen et al., 2019; Zhou et al., 2020). To address these challenges, advanced computational approaches, such as deep learning, have been increasingly integrated into medical imaging to enhance diagnostic precision (Aggarwal et al., 2021; Mridha et al., 2021). Recent advances in deep learning have shown promising results in medical image analysis, particularly for the classification of breast lesions (Al-antari et al., 2020; Mahmood et al., 2022). Deep learning models, such as convolutional neural networks (CNNs), can automatically extract and learn complex features from imaging data, offering potential improvements over traditional radiologist-based assessments (Jiang et al., 2021). However, conventional two-dimensional (2D) deep learning models may not fully capture the dynamic temporal and spatial information provided by CEUS. Multichannel deep learning approaches, which incorporate multiple image frames or regions of interest (ROIs) derived from CEUS, can better leverage this dynamic information, thereby improving diagnostic performance. Nevertheless, few studies have explored the use of a DL model for the differentiation of benign and malignant breast nodules using CEUS.

This study aims to evaluate the diagnostic performance of a novel multichannel deep learning model based on CEUS data, utilizing multiple ROI types and time-intensity curve (TIC)-derived key frames, for predicting the malignancy of breast nodules. By integrating clinical features into a clinical-deep learning combined model, we seek to develop a robust and generalizable tool for breast lesion classification. The core innovation of this work lies in the proposed MultiChannel_2.5D_DL model, which employs a unique multichannel input strategy based on TIC key frames and systematically explores different ROI configurations (intratumoral and peritumoral) to optimize diagnostic performance.

2 Materials and methods

2.1 Patients

This retrospective study analyzed data from patients who underwent breast ultrasonography at Shenzhen Hospital of Southern Medical University and Southern University of Science and Technology Hospital. The Institutional Review Boards of both participating institutions approved the study, and informed consent was waived. The inclusion criteria were as follows: (a) All patients had no US contrast agent contraindication; (b) B-mode and US contrast ultrasound examinations of histopathologically confirmed breast tumors (diagnosed by core biopsy) and of benign breast tumors confirmed either histopathologically or by 2-year follow-up; (c) available clinical data. The exclusion criteria were meticulously defined as follows: (a) pregnant or lactating women; (b) any preoperative interventions or therapies (e.g., radiotherapy, chemotherapy, radiofrequency ablation) before US examination; (c) incomplete clinical or imaging data; (d) the target tumor was unclear or had no visible ROI on US images due to artifacts.

The training and internal validation datasets were obtained from Shenzhen Hospital of Southern Medical University between June 2016 and October 2017 using a Philips EPIQ5 ultrasound system, comprising a total of 89 patients with breast nodules, and were randomly split into a training cohort (n = 62) and an internal validation cohort (n = 27) at a 7:3 ratio. The external validation dataset was collected from Southern University of Science and Technology Hospital between November 2022 and November 2024 using a Mindray Nuewa R9T ultrasound system, comprising 52 patients. Figure 1 illustrates a flowchart delineating the patient inclusion process.

Figure 1

2.2 Clinical variables

The clinical variables included in this study were patients age, maximum diameter of the breast nodule, composition, echo pattern, posterior acoustic features, aspect ratio, Boundary of enhancement, margin characteristics, calcifications, pathological data, and color Doppler flow imaging (CDFI) blood flow was graded using the Adler classification (Adler et al., 1990).

2.3 Imaging protocols

All examinations were conducted by sonographers with more than five years of dedicated experience in breast ultrasound, each having undergone specialized training to harmonize the interpretation of key imaging features and to ensure proficiency in using structured reporting templates. B-mode US and CEUS were obtained and archived following established protocols. Qualitative feature evaluations (e.g., assessment of margins or signs of invasion) were independently carried out by two radiologists from the breast subspecialty section of the ultrasound department, with both blinded to patient outcomes to maintain assessment consistency. Disagreements were settled through joint review with a senior radiologist. This standardized workflow guaranteed consistent recording of conventional ultrasound characteristics across the entire patient cohort. After B-mode imaging, CEUS was performed with a low mechanical index (approximately 0.06) to assess tumor perfusion. An intravenous bolus of 4.8mL sulfur hexafluoride microbubble contrast agent (SonoVue®) was administered, immediately followed by a 5 mL saline flush. The dynamic enhancement process of the lesion was continuously monitored and stored for 1–2 minutes. For subsequent analysis, the imaging plane displaying either the largest cross-section of the tumor or the most abundant vascularization was selected.

2.4 Image processing and standardization

In our study, the ROI was manually delineated on the dataset using ITK-SNAP 4.0.2, with annotations conducted by two experienced radiologists. Any discrepancies in their annotations were resolved by a senior radiologist boasting over 20 years of experience. For each patient, we selected the slice that presented the largest ROI as the representative image. Manual segmentation of tumors is performed on B-mode ultrasound images and CEUS images, we generate intratumor, tumor bounding boxes. This tumor segments was expanded by an additional 2mm, 5mm.

At Institution 1 (2016–2017), imaging was performed using a Philips EPIQ5 system with a 9 MHz linear probe, while at Institution 2 (2022–2024), a Mindray Nuewa R9T system with a 12 MHz linear probe was used. All inputs were resized to 1024 × 768 pixels and normalized using the mean and standard deviation values from the ImageNet dataset. Importantly, the multichannel 2.5D deep learning model primarily relied on relative changes in time-intensity curves across multiple intratumoral and peritumoral ROIs rather than absolute grayscale intensity values. This design inherently improves the model’s robustness to inter-manufacturer and temporal variations in ultrasound equipment.

2.5 Multi-channel image fusion

The ROI delineation is important for processing contrast-enhanced ultrasound data. The key issue in identifying the ROI in the raw TIC dataset is to pinpoint the critical phases of contrast dynamics. TIC patterns demonstrate that malignant lesions typically exhibit rapid wash-in followed by rapid wash-out or rapid wash-in followed by slow wash-out (Jung et al., 2021). In each patient, the contrast-enhanced ultrasound (CEUS) video was recorded continuously for 1–2 minutes and typically contained approximately 100–200 frames, depending on the frame rate and recording duration. To capture the essential dynamic enhancement patterns while reducing redundancy and computational burden, five representative key frames were manually selected by experienced radiologists according to the phases of the TIC: the initial stage, ascending stage, peak stage, descending stage, and wash-out stage. The selected key frames from these phases resulted in a collection of 2D images (five frames per patient) that were centered on the primary temporal ROI dynamics and adequately represented the perfusion characteristics critical for malignancy prediction.

Across the entire cohort of 141 patients (89 from Institution 1 and 52 from Institution 2), this process yielded a total of 705 selected key frame images (five per patient). The integration process involved stacking these frames as multi-channel 2D images to incorporate information from adjacent temporal peri-regions, which served as the final input to the deep learning model. Because the model uses multiple 2D images from various directions and, thus, could be in the middle of 2D and 3-dimensional (3D), it was called a 2.5-dimensional (2.5D) imaging technique. The model structure is shown in Figure 2.

Figure 2

2.6 Model training

The deep learning model was based on DenseNet201 as the backbone convolutional neural network. The model was trained end-to-end on both 2.5-dimensional inputs (multi-channel 2D images comprising five stacked key frames per patient) and standard 2D inputs. No additional data augmentation techniques were applied, and no specific measures were taken to address class imbalance.

Training was performed using stochastic gradient descent as the optimizer with cross-entropy as the loss function. The batch size was set to 32, and the model was trained for 40 epochs without early stopping. Random seeds were not fixed across experiments. All training and evaluation were conducted on a single NVIDIA GeForce RTX 4060 GPU.

For comparison, clinical models were constructed using the k-nearest neighbors (KNN) algorithm.

Finally, a combined model was developed to integrate all available predictive factors, incorporating outputs from both the 2D_DL and MultiChannel_2.5D_DL models as well as the clinical model, thereby enabling a comprehensive evaluation of enhanced predictive performance. The research flowchart is illustrated in Figure 3.

Figure 3

To visualize the image regions contributing most significantly to the model’s predictions, we employed the Gradient-weighted Class Activation Mapping (Grad-CAM) algorithm to generate heatmaps. Specifically, Grad-CAM was applied to the final convolutional layer of the model, yielding heatmaps with optimal spatial resolution.

2.7 Reproducibility and data availability

All key model parameters, including the DenseNet201 backbone, 2.5-dimensional multi-channel inputs, stochastic gradient descent optimizer, batch size of 32, 40 training epochs, and RTX 4060 GPU environment, are fully described in the Model Training subsection. All data analyses were executed on the OnekeyAI platform (version 4.9.1) utilizing Python 3.7.12, with statistical assessments conducted using Statsmodels version 0.13.2. Random seeds were not fixed. The complete code for preprocessing, model training, and inference is available from the corresponding author upon reasonable request.

All patient data were fully de-identified prior to analysis in accordance with the Declaration of Helsinki and institutional review board approval. No identifiable information was retained. The de-identified dataset is not publicly available due to institutional data protection policies but can be accessed from the corresponding author upon reasonable request, subject to approval by the institutional ethics committee and execution of a formal data use agreement.

2.8 Hyper parameters

To enhance the model’s performance across diverse patient populations characterized by substantial heterogeneity, we adopted a transfer learning paradigm. This entailed initializing the model with pre-trained weights derived from the ImageNet dataset, thereby augmenting its capacity for adaptation to varied datasets. A pivotal component of our methodology involved the judicious optimization of the learning rate to promote superior generalization across datasets. Accordingly, we utilized a cosine decay learning rate schedule, formulated as follows:

where is the learning rate at the current training step , and are the minimum and maximum learning rates in the i-th stage, respectively, is the current epoch number, and is the total number of epochs in the i-th stage.

2.9 Clinical reference standard

BI-RADS categories were extracted from original radiology reports. For diagnostic performance comparison with the DL models, BI-RADS 3 and 4A were defined as benign lesions, and 4B, 4C, and 5 as malignant lesions (consistent with clinical practice for comparison with senior radiologists). All performance metrics were calculated on the external validation set.

2.10 Statistical analysis

Normality of continuous variables was assessed with the Shapiro–Wilk test. Group comparisons for continuous variables were performed using the independent-samples t test or the Mann–Whitney U test, as appropriate. Categorical variables were compared using the χ² test. A p-value greater than 0.05 indicated no significant difference between groups, confirming that dataset allocation was unbiased.

Model performance in the test cohort was evaluated by receiver operating characteristic (ROC) analysis, with the area under the curve (AUC) used to quantify discrimination. Calibration was assessed with calibration plots and the Hosmer–Lemeshow goodness-of-fit test. Clinical usefulness was examined using decision curve analysis (DCA) to estimate net benefit across a range of decision thresholds. Threshold probabilities ranging from 0 to 1.0 were evaluated to provide a comprehensive assessment of net benefit across the entire spectrum of possible clinical decision thresholds (Vickers et al., 2008). All data analyses were executed on the OnekeyAI platform version 4.9.1 utilizing Python 3.7.12. Statistical assessments were conducted with Statsmodels version 0.13.2.

3 Results

3.1 Patient cohort

This study enrolled 141 breast nodule patients. an age ranging from 19 to 79 years (mean age, 43.06 ± 12.16[standard deviation]). These patients were segregated into three cohorts: a training cohort (n=62), internal validation cohort (n=27), and external validation cohort (n=52). The pathological results showed 40 benign nodules and 22 malignant nodules in the training cohort, 17 benign nodules and 10 malignant nodules in the internal validation cohort, 39 benign nodules and 13 malignant nodules in the external validation cohort. Clinical data are detailed in Table 1. In the training cohort, significant differences were observed between benign and malignant nodules in age, shape, Boundary of enhancement, Posterior acoustic shadowing, margin and orientation (p < 0.05), which were subsequently included in clinical model. The model achieved an area under the receiver operating characteristic curve (AUC) of 0.742 (95% confidence interval (CI): 0.611, 0.874) in the training cohort. Comparable discriminative performance was observed in the external validation cohort, with an AUC of 0.821 (95% CI: 0.671, 0.970).

Table 1

Clinical featureTraining cohort (62)Internal validation cohort (27)External validation cohort (52)
Benign MalignantP valueBenign MalignantP valueBenignMalignantP value
Age (y)40.27 ± 11.2848.77 ± 12.440.00642.65 ± 14.3744.00 ± 9.260.79340.92 ± 11.9648.15 ± 10.880.06
Lesion diameter (mm)18.77 ± 10.2525.00 ± 14.140.08217.18 ± 6.8419.00 ± 12.530.9410.86 ± 7.84 18.23 ± 6.64<0.001
Echo pattern0.6580.2510.303
 Isoechoic2 (5.00) 1 (4.55)1 (5.88)000
 Mixed Echogenicity7 (17.50) 2 (9.09)3 (17.65) 01 (2.56) 2 (15.38) 
 Hypoechoic31 (77.50) 19 (86.36)13 (76.47)10 (100.00) 38 (97.44) 11 (84.62)
Shape0.028 1.00.025
 Regular13 (32.50)1 (4.55) 2 (11.76) 1 (10.00) 22 (56.41) 2 (15.38)
 Irregular27 (67.50)21 (95.45) 15 (88.24) 9 (90.00) 17 (43.59)11 (84.62) 
Microcalcification1.00.2020.618
 Absent9 (22.50) 5 (22.73)9 (52.94)2 (20.00) 26 (66.67) 7 (53.85) 
 Present31 (77.50)17 (77.27) 8 (47.06) 8 (80.00) 13 (33.33) 6 (46.15) 
Boundary of enhancement<0.0010.003<0.001
 Clear28 (70.00) 2 (9.09) 13 (76.47) 1 (10.00)37 (94.87)6 (46.15) 
 Fuzzy12 (30.00) 20 (90.91) 4 (23.53) 9 (90.00) 2 (5.13) 7 (53.85) 
Posterior acoustic shadowing0.029 0.248 0.241
 Absent36 (90.00)14 (63.64) 17 (100.00)8 (80.00) 32 (82.05) 13 (100.00) 
 Present4 (10.00) 8 (36.36) 02 (20.00) 7 (17.95)0
Margin<0.0010.057 0.057
 Circumscribed17 (42.50) 07 (41.18) 012 (30.77) 0
 Non-circumscribed23 (57.50) 22 (100.00)10 (58.82) 10 (100.00)27 (69.23)13 (100.00) 
Orientation<0.0010.0021.0
 Parallel36 (90.00) 10 (45.45)16 (94.12) 3 (30.00) 34 (87.18) 11 (84.62) 
 Non-parallel4 (10.00)12 (54.55) 1 (5.88) 7 (70.00)5 (12.82) 2 (15.38)
Vascularity0.099 0.481 <0.001
 Absent21 (52.50) 6 (27.27) 7 (41.18)2 (20.00) 26 (66.67)1 (7.69)
 Present19 (47.50)16 (72.73)10 (58.82)8 (80.00) 13 (33.33)12 (92.31)

Clinical and nodule characteristics of the study cohorts.

*p < 0.05, two-sided false discovery rate corrected.

3.2 Development and performance of deep learning models

Based on different ROI types, DenseNet201 was employed to generate 2D_DL and MultiChannel_2.5D_DL models, respectively. In the bounding box models, the MultiChannel_2.5D_DL model achieved an AUC of 0.808 (95% CI: 0.696, 0.920) in the training cohort, which was higher than that of the 2D_DL model (AUC, 0.667; 95% CI: 0.523, 0.810) (p = 0.13). however, the 2D_DL model outperformed the MultiChannel_2.5D_DL model in the internal validation cohort [AUC 0.782 (95% CI: 0.603, 0.962) vs. 0.624 (95% CI: 0.389–0.859)] (p = 0.33) and external validation cohort [AUC 0.751 (95% CI: 0.584, 0.919) vs. 0.558 (95% CI: 0.342–0.774)] (p = 0.13). In the intratumoral models, the 2D_DL model demonstrated superior performance across all validation sets compared with the MultiChannel_2.5D_DL model (mean p = 0.36). However, comparisons of different peritumoral ROI sizes in the MultiChannel_2.5D_DL models revealed that the intratumoral model combined with a 2-mm peritumoral ROI yielded the highest predictive performance. For the intratumoral model combined with the 2 mm-ROI of peritumor, the MultiChannel_2.5D_DL model (AUC, 0.776; 95% CI: 0.654, 0.898) outperformed the clinical model (AUC, 0.742; 95% CI: 0.611, 0.874) (p = 0.70) but was inferior to the 2D_DL model (AUC, 0.892; 95% CI: 0.800, 0.984) (p = 0.08) in the training cohort; in the internal validation cohort, the MultiChannel_2.5D_DL model (AUC, 0.712; 95% CI: 0.511, 0.912) outperformed both the clinical model (AUC, 0.553; 95% CI: 0.312, 0.791) (p = 0.17) and the 2D_DL model (AUC, 0.688; 95% CI: 0.466, 0.911) (p = 0.88); in the external validation cohort, the MultiChannel_2.5D_DL model (AUC, 0.824; 95% CI: 0.677, 0.972) outperformed both the clinical model (AUC = 0.821; 95% CI: 0.671, 0.970) (p = 0.96) and the 2D_DL model (AUC = 0.789; 95% CI: 0.660, 0.918) (p = 0.71). The detailed statistical results of the DL models are presented in Table 2.

Table 2

Model nameSetAccuracyAUCSensitivitySpecificityPPVNPVF1
Intra MultiChannel_2.5D_DLTraining0.5970.545 (0.394-0.697)0.7270.5250.4570.7780.561
Internal validation0.7410.553 (0.295-0.811)0.3001.0001.0000.7080.462
External validation0.6540.602 (0.401-0.802)0.5380.6920.3680.8180.437
Intra 2D_DLTraining0.7580.724 (0.586-0.862)0.5000.9000.7330.7660.595
Internal validation0.7410.588 (0.345-0.832)0.5000.8820.7140.7500.588
External validation0.6730.755 (0.610-0.901)0.9230.5900.4290.9580.585
Bounding box MultiChannel_2.5D_DLTraining0.7580.808 (0.696-0.920)0.8640.7000.6130.9030.717
Internal validation0.5930.624 (0.389-0.859)0.9000.4120.4740.8750.621
External validation0.7310.558 (0.342-0.774)0.5380.7950.4670.8380.500
Bounding box 2D_DLTraining0.6450.667 (0.524-0.810)0.8180.5500.5000.8460.621
Internal validation0.7410.782 (0.603-0.962)0.9000.6470.6000.9170.720
External validation0.7500.751 (0.584-0.919)0.6920.7690.5000.8820.581
Intra+2mm-peri MultiChannel_2.5D_DLTraining0.6940.776 (0.654-0.898)0.8640.6000.5430.8890.667
Internal validation0.7040.712 (0.511 -0.912)1.0000.5290.5561.0000.714
External validation0.8080.824 (0.677 -0.972)0.7690.8210.5880.9140.667
Intra+2 mm-peri 2D_DLTraining0.8870.892 (0.800 -0.984)0.8180.9250.8570.9020.837
Internal validation0.7410.688 (0.466 -0.911)0.5000.8820.7140.7500.588
External validation0.6920.789 (0.660-0.918)0.9230.6150.4440.9600.600
Intra+5mm-peri MultiChannel_2.5D_DLTraining0.7900.831 (0.725-0.936)0.7270.8250.6960.8460.711
Internal validation0.8150.712 (0.478-0.946)0.5001.001.0000.7730.667
External validation0.6920.623 (0.434-0.813)0.5380.7440.4120.8290.467
Intra+5 mm-peri 2D_DLTraining0.6290.628 (0.484-0.773)0.7730.5500.4860.8150.596
Internal validation0.8150.841 (0.686-0.997)0.9000.7650.6920.9290.783
External validation0.7500.700 (0.540-0.860)0.5380.8210.5000.8420.519
ClinicalTraining0.7100.742 (0.611 -0.874)0.7270.7000.5710.8240.640
Internal validation0.6300.553 (0.312-0.791)0.4000.7650.5000.6840.444
External validation0.8080.821 (0.671-0.970)0.7690.8210.5880.9140.667

Performances of the predictive models in the study cohorts.

2D, 2-dimensional; 2.5D, 2.5-dimensional; AUC, area under the receiver; DL, deep learning; NPV, negative predictive value; PPV, positive predictive value; F1, F1-score.

3.3 Combined model and performance evaluation

To demonstrate the clinical superiority of the proposed MultiChannel_2.5D_DL model (intratumoral + 2-mm peritumoral ROI), we developed a combined model by integrating its output with the 2D_DL and clinical models via logistic regression. This model is aimed at differentiating benign from malignant breast nodules. The AUC values in the training cohort, internal validation cohort, and external validation cohort were 0.975 (95% CI: 0.945, 1.000), 0.929 (95% CI: 0.837, 1.000), and 0.949 (95% CI: 0.888, 1.000), respectively. The ROC curves for the combined model and individual models are shown in Figures 4A–C. The diagnostic performance metrics of this model are detailed in Table 3. The DeLong test indicated that the combined model significantly outperformed individual model signatures (mean p < 0.05) (Figures 4D–F). Additionally, the calibration curves for the combined model demonstrated good calibration in the training, internal validation, and external validation sets (Figures 5A–C). Figure 5D-F presents the DCA for all study cohorts. The combined model demonstrated considerable advantages in terms of predicted probabilities and consistently yielded greater net clinical benefit compared with the other model signatures, thereby underscoring its clinical utility. In the external validation cohort, the combined model showed numerically superior performance to BI-RADS assessment by senior radiologists using the ≥4B cutoff (AUC 0.949 [95% CI: 0.888, 1.000] vs 0.897 [95% CI: 0.808, 0.986], p=0.15); Table 4).

Figure 4

Table 3

Model nameSetAccuracyAUCSensitivitySpecificityPPVNPVF1
CombinedTraining0.9350.975 (0.945-1.000)0.9090.9500.9090.9500.909
Internal validation0.8520.929 (0.837-1.000)1.0000.5290.7141.0000.833
External validation0.9230.949 (0.888-1.000)0.8460.9490.8460.9490.846

Performances of the combined models in the study cohorts.

AUC, area under the receiver; BI-RADS, Breast Imaging-Reporting and Data System; NPV, negative predictive value; PPV, positive predictive value; F1, F1-score.

Figure 5

Table 4

Model nameAccuracyAUCSensitivitySpecificityPPVNPVF1
Combined0.9230.949 (0.888-1.000)0.8460.9490.8460.9490.846
US_BI-RADS
(senior radiologists,
≥4B cutoff)
0.8850.897 (0.808-0.986)0.9230.8720.7060.9710.800

Diagnostic performance of the combined model compared with BI-RADS assessment by senior radiologists in the external validation cohort.

AUC, area under the receiver; NPV, negative predictive value; PPV, positive predictive value; F1, F1-score.

3.4 Grad-CAM visualization

To assess the recognition capability of the DL models, the Grad-CAM technique was applied for visual interpretation. As illustrated in Figure 6, Grad-CAM localized the most informative regions in the final convolutional layer for cancer type prediction. This approach enabled identification of image areas exerting the greatest influence on model decisions, thereby enhancing the interpretability of the DL framework.

Figure 6

4 Discussion

MultiChannel_2.5D_DL models typically process multiple adjacent 2D slices or channels as input, thereby capturing partial 3D context without the computational burden associated with full 3D models. These models often employ parallel 2D CNNs or transformer-based architectures to integrate information across slices or modalities (Hu et al., 2019; Alsaih et al., 2020; Kim et al., 2024; Niu et al., 2025). In contrast, 2D models process single slices or channels independently, which limits their ability to capture inter-slice correlations (Xiong et al., 2023; Niu et al., 2025; Tondji et al., 2025). In this study, we developed MultiChannel_2.5D_DL, 2D_DL, and clinical models based on B-mode US and CEUS images, along with clinical data from the primary tumor, to predict the malignancy of breast nodules. Our results demonstrated that the segmentation model incorporating a 2-mm peritumoral ROI yielded the optimal predictive performance; in this context, the MultiChannel_2.5D_DL model outperformed both the clinical model and the 2D DL model in the internal and external validation sets.

ROI delineation was performed independently by two experienced radiologists, with any discrepancies resolved by a senior radiologist with over 20 years of experience. Although quantitative inter-observer consistency metrics (such as the Dice coefficient or intraclass correlation coefficient) were not formally calculated in this study, the senior review process was implemented to minimize variability and enhance reproducibility.

Although no data augmentation or class-balancing techniques were employed, overfitting was mitigated through transfer learning with ImageNet-pretrained weights, early stopping was not required due to stable convergence, and external validation on an independent cohort provided robust evaluation despite the class imbalance.

In previous studies, MultiChannel_2.5D_DL models have outperformed 2D models in tasks such as tumor classification, organ segmentation, and disease subtype differentiation. For example, in glioma classification, the MultiChannel_2.5D_DL model achieved AUC values of 0.806–0.870, compared with lower performance for 2D models (Niu et al., 2025). In pancreas segmentation, the 2.5D model reached a Dice similarity coefficient of 75.1%, surpassing the 2D model (Tondji et al., 2025). These findings are consistent with the results of the present study.

Recent studies have demonstrated that MultiChannel_2.5D_DL models, particularly when integrated with clinical and radiomics features, can significantly enhance diagnostic accuracy and risk stratification across diverse applications. For example, in glioma subtyping, a MultiChannel_2.5D_DL model achieved AUC values of 0.806–0.870, while a stacking ensemble further improved performance to 0.855–0.904 (Niu et al., 2025). In hepatocellular carcinoma recurrence prediction, a combined 2.5D DL and clinical model attained AUCs up to 0.921 (Zhang et al., 2024). These findings are consistent with the results of the present study. In our study, we integrated the MultiChannel_2.5D_DL, 2D_DL, and clinical models to create a combined model, which was evaluated using an external validation cohort and exhibited the highest performance among all models assessed.

Previous studies have highlighted the broad application of DL in breast cancer diagnosis, with many models achieving high diagnostic performance. For example, Jiang et al. (2021) applied deep convolutional neural networks to ultrasound (US) images for molecular subtype classification, achieving accuracies ranging from 80.07% to 97.02% across four subtypes. Xiang et al. (2023) reported that a dual attention-based CNN reached expert-level performance in differentiating malignant from benign tumors, with AUCs up to 0.96 in external datasets. Similarly, Jia et al. (2023) demonstrated that DL models could noninvasively predict tumor-infiltrating lymphocyte levels from US images, with an AUC of 0.873 and an external validation accuracy of 79.50%, providing an alternative to traditional biopsy. Despite these advances, most existing approaches rely primarily on B-mode US, with limited exploration of CEUS. To our knowledge, the present study is the first to utilize a TIC dataset to identify key phases of contrast enhancement and to integrate MultiChannel_2.5D_DL for predictive modeling of breast nodules. In addition, we systematically compared the diagnostic performance of models based on tumor bounding boxes, intratumoral regions, and intratumoral regions expanded by 2-mm and 5-mm peritumoral margins.

CEUS employs microbubble contrast agents to improve the visualization of blood flow in both intratumoral and peritumoral regions, thereby yielding insights into tumor biology and aggressiveness (Zhao et al., 2017). Specific enhancement patterns, such as perfusion defects and peripheral hyperenhancement, represent critical features for differentiating benign from malignant lesions (Li et al., 2025). Malignant lesions commonly demonstrate an expansion of the enhancement area beyond the tumor boundaries as depicted on conventional ultrasound (Cai et al., 2013; Boca (Bene) et al., 2021). A recent study has shown that radiomics models incorporating features from both intratumoral and peritumoral regions outperform those based solely on intratumoral features, attaining high diagnostic accuracy (AUCs up to 0.949) and offering enhanced clinical decision support for the early detection of breast cancer (Li et al., 2025). Moreover, CEUS-based radiomics, particularly when integrating intratumoral and peritumoral features, facilitates the noninvasive prediction of breast cancer molecular subtypes, including luminal, HER2-positive, and triple-negative variants. These combined models achieve high accuracy (AUCs up to 0.956) in distinguishing luminal from non-luminal subtypes (Xu et al., 2025). In addition, ultrasound-based radiomics can discriminate among three distinct HER2 expression states (positive, low, and zero) with macro-AUCs up to 0.988, thereby aiding personalized decisions on HER2-targeted therapies (Luo et al., 2025).

The present study integrates DL methodologies with CEUS imaging to enhance breast cancer diagnosis. Notably, it innovatively employs MultiChannel_2.5D technology for training predictive models, yielding superior performance of the intratumoral plus peritumoral models over the intratumoral model alone in the training cohort, consistent with prior investigations. In the external validation cohort, the intratumoral model combined with a 2-mm ROI exhibited higher efficacy compared to combinations with a 5-mm ROI or the bounding box model. This disparity may arise from the relatively small diameters of the breast nodules included in this study, where extending the peritumoral distance beyond a certain threshold incorporates excessive irrelevant information; in contrast, the 2-mm ROI comprehensively captures peritumoral blood flow data without including substantial extraneous details. In our MultiChannel_2.5D_DL model for predicting malignancy in breast nodules, Grad-CAM visualizations revealed focal activations manifesting as irregular clusters and hotspots, with predominant emphasis on lesion boundaries. This saliency pattern highlights the model’s reliance on edge-related features, which are augmented by morphological intricacies and high-contrast gradients derived from dynamic perfusion variations in CEUS time-intensity curve images across five temporal phases. Specifically, these boundaries elicit robust gradient responses attributable to characteristic perfusion profiles, such as peripheral hyperenhancement or filling defects, thereby enhancing discriminative capability for malignancy identification. The 2.5D architecture effectively assimilates these multi-channel temporal inputs, extracting and fusing inter-phase contextual cues pertaining to blood flow dynamics. Consequently, model attention is directed toward boundary regions, where cross-phase temporal coherence yields resilient features that underpin improved diagnostic accuracy.

In summary, our results demonstrate that MultiChannel_2.5D_DL models, when combined with CEUS and an optimally defined peritumoral ROI, provide superior diagnostic performance compared with 2D and clinical models. Notably, the 2-mm peritumoral region proved most effective, whereas larger ROIs introduced redundant information that impaired model efficacy. These findings underscore the dual importance of advanced network architectures and precise ROI delineation, and suggest that systematic evaluation of peritumoral region size may further improve the reliability and generalizability of CEUS-based DL models in breast cancer diagnosis.

Although the difference did not reach statistical significance (p=0.15), the combined model demonstrated diagnostic performance comparable to that of senior radiologists using the ≥4B cutoff (AUC 0.949 vs. 0.897; Table 4). These findings highlight its potential as an AI-assisted decision support tool to augment radiologist performance in clinical practice.

This study has several limitations. First, the relatively small sample size (n=141 overall, with only 13 malignant nodules in the external validation cohort) may limit statistical power and generalizability. To mitigate this, we employed transfer learning with ImageNet-pretrained weights, external validation across two centers with different ultrasound systems, and rigorous cross-validation strategies. Nevertheless, future studies with larger, more balanced cohorts are warranted to further validate the model. Second, the two centers used different ultrasound systems and probe frequencies over a 6–8 year period. Although mechanical index and contrast agent dosage were standardized, residual differences in imaging parameters cannot be entirely excluded. Importantly, the multichannel 2.5D model primarily captured relative changes in time-intensity curves across ROIs rather than absolute grayscale intensity. Consequently, differences in probe frequency or system gain are expected to have minimal impact on model performance, as relative enhancement patterns remain consistent across platforms.

5 Conclusions

Our study demonstrates that the proposed MultiChannel_2.5D_DL model, which incorporates TIC-derived multichannel inputs and an optimally identified 2-mm peritumoral ROI, offers promising accuracy and generalizability when integrated with 2D_DL and clinical features into a combined model. This combined model demonstrated diagnostic performance comparable to that of senior radiologists, although the difference did not reach statistical significance. These findings highlight its potential as an AI-assisted decision support tool for CEUS-based breast nodule malignancy prediction, potentially assisting radiologists in reducing interobserver variability and unnecessary biopsies.

Statements

Data availability statement

The datasets presented in this article are not readily available because The de-identified dataset is not publicly available due to institutional data protection policies and ethical restrictions but can be accessed from the corresponding author upon reasonable request, subject to approval by the institutional ethics committee and execution of a formal data use agreement. Requests to access the datasets should be directed to kunsun, .

Ethics statement

The studies involving humans were approved by The Institutional Review Board of Shenzhen Hospital of Southern Medical University and Southern University of Science and Technology Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. The Institutional Review Board of Shenzhen Hospital of Southern Medical University and Southern University of Science and Technology Hospital waived the requirement of written informed consent for participation from the participants or the participants’ legal guardians/next of kin due to the retrospective nature of the analysis of de-identified data.

Author contributions

JFX: Conceptualization, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing, Formal analysis. YH: Data curation, Investigation, Software, Writing – review & editing, Validation. JHZ: Methodology, Software, Writing – review & editing, Formal analysis. CYL: Visualization, Writing – review & editing. JYZ: Investigation, Writing – review & editing. LW: Data curation, Visualization, Writing – review & editing. SYZ: Investigation, Visualization, Writing – review & editing. WJQ: Formal analysis, Project administration, Resources, Validation, Writing – review & editing. KS: Conceptualization, Project administration, Resources, Supervision, Validation, Writing – review & editing, Funding acquisition.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This study has received funding by Shenzhen Science and Technology Innovation Commission (JCYJ20220530112805013).

Acknowledgments

We thank all colleagues from all the centers for their contributions to this study.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abbreviations

Abbreviations

CEUS, Contrast-enhanced ultrasound; 2D, 2-dimensional; 2.5D, 2.5-dimensional; TIC, Time-intensity curve; ROI, Region of interest; CI, Confidence interval; DL, Deep learning; CDFI, Color Doppler flow imaging; AUC, Area under the receiver operating characteristic curve; ROC, Receiver operating characteristic curve; DCA, Decision curve analysis; Grad-CAM, Gradient-weighted class activation mapping; CNN, Convolutional neural network.

References

  • 1

    AdlerD. D.CarsonP. L.RubinJ. M.Quinn-ReidD. (1990). Doppler ultrasound color flow imaging in the study of breast cancer: Preliminary findings. Ultrasound Med. Biol.16, 553–559. doi: 10.1016/0301-5629(90)90020-d. PMID:

  • 2

    AggarwalR.SounderajahV.MartinG.TingD.KarthikesalingamA.KingD.et al. (2021). Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digital Med.4. doi: 10.1038/s41746-021-00438-z. PMID:

  • 3

    Al-antariM. A.HanS.KimT.-S. (2020). Evaluation of deep learning detection and classification towards computer-aided diagnosis of breast lesions in digital X-ray mammograms. Comput. Methods Programs Biomed.196. doi: 10.1016/j.cmpb.2020.105584. PMID:

  • 4

    AlsaihK.YusoffM.FayeI.TangT.MériaudeauF. (2020). Retinal fluid segmentation using ensembled 2-dimensionally and 2.5-dimensionally deep learning networks. IEEE Access8, 152452–152464. doi: 10.1109/ACCESS.2020.3017449. PMID:

  • 5

    Boca (Bene)I.DudeaS.CiureaA. (2021). Contrast-enhanced ultrasonography in the diagnosis and treatment modulation of breast cancer. J. Personalized Med.11. doi: 10.3390/jpm11020081. PMID:

  • 6

    CaiL.ZhangJ.SongG.ChenL.DaiJ. (2013). value of contrast-enhanced sonography in early diagnosis of breast cancer. Nan Fang Yi Ke Da Xue Xue Bao33, 1801–1805.

  • 7

    ChenY.TangL.DuZ.ZhongZ.LuoJ.YangL.et al. (2019). Factors influencing the performance of a diagnostic model including contrast-enhanced ultrasound in 1023 breast lesions: Comparison with histopathology. Ann. Transl. Med.7, 647. doi: 10.21037/atm.2019.10.83. PMID:

  • 8

    GiaquintoA. N.SungH.MillerK. D.KramerJ. L.NewmanL. A.MinihanA.et al. (2022). Breast cancer statistics 2022. CA. Cancer J. Clin.72, 524–541. doi: 10.3322/caac.21754. PMID:

  • 9

    HuJ.KuangY.LiaoB.CaoL.DongS.LiP. (2019). A multichannel 2D convolutional neural network model for task-evoked fMRI data classification. Comput. Intell. Neurosci.2019, 5065214. doi: 10.1155/2019/5065214. PMID:

  • 10

    JiaY.WuR.LuX.DuanY.ZhuY.MaY.et al. (2023). Deep learning with transformer or convolutional neural network in the assessment of tumor-infiltrating lymphocytes (TILs) in breast cancer based on US images: A dual-center retrospective study. Cancers (Basel)15, 838. doi: 10.3390/cancers15030838. PMID:

  • 11

    JiangM.ZhangD.TangS.-C.LuoX.-M.ChuanZ.-R.LvW.-Z.et al. (2021). Deep learning with convolutional neural network in the assessment of breast cancer molecular subtypes based on US images: A multicenter retrospective study. Eur. Radiol.31, 3673–3682. doi: 10.1007/s00330-020-07544-8. PMID:

  • 12

    JungE. M.JungF.StroszczynskiC.WiesingerI. (2021). Quantification of dynamic contrast-enhanced ultrasound (CEUS) in non-cystic breast lesions using external perfusion software. Sci. Rep.11, 17677. doi: 10.1038/s41598-021-96137-6. PMID:

  • 13

    KimY.KimY.-G.ParkJ.-W.KimB. W.ShinY.KongS. H.et al. (2024). A CT-based deep learning model for predicting subsequent fracture risk in patients with hip fracture. Radiology310, e230614. doi: 10.1148/radiol.230614. PMID:

  • 14

    LiG.HuangX.WuH.TianH.HuangZ.WangM.et al. (2025). Enhancing early breast cancer diagnosis with contrast-enhanced ultrasound radiomics: Insights from intratumoral and peritumoral analysis. Clin. Breast Cancer25. doi: 10.1016/j.clbc.2024.11.011. PMID:

  • 15

    LiW.ZhaoY.FeiX.WuY.ZhanW.ZhouW.et al. (2025). Image features and diagnostic value of contrast-enhanced ultrasound for ductal carcinoma in situ of the breast: Preliminary findings. Ultrason. Imaging47, 59–67. doi: 10.1177/01617346241292032. PMID:

  • 16

    LuoS.ChenX.YaoM.YingY.HuangZ.ZhouX.et al. (2025). Intratumoral and peritumoral ultrasound-based radiomics for preoperative prediction of HER2-low breast cancer: A multicenter retrospective study. Insights Imaging16, 53. doi: 10.1186/s13244-025-01934-6. PMID:

  • 17

    MahmoodT.LiJ.PeiY.AkhtarF.RehmanM.WastiS. (2022). Breast lesions classifications of mammographic images using a deep convolutional neural network-based approach. PLoS One17. doi: 10.1371/journal.pone.0263126. PMID:

  • 18

    MridhaM. F.HamidM. A.MonowarM. M.KeyaA. J.OhiA. Q.IslamM. R.et al. (2021). A comprehensive survey on deep-learning-based breast cancer diagnosis. Cancers13, 6116. doi: 10.3390/cancers13236116. PMID:

  • 19

    NiuW.YanJ.HaoM.ZhangY.LiT.LiuC.et al. (2025). MRI transformer deep learning and radiomics for predicting IDH wild type TERT promoter mutant gliomas. NPJ Precis. Oncol.9, 89. doi: 10.1038/s41698-025-00884-y. PMID:

  • 20

    TondjiI.LizziF.ScapicchioC.ReticoA. (2025). 2.5D deep learning model with attention mechanism for pancreas segmentation on CT scans. 1, 669–675. doi: 10.5220/0013314500003911

  • 21

    VickersA. J.CroninA. M.ElkinE. B.GonenM. (2008). Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med. Inform. Decis. Mak8, 53. doi: 10.1186/1472-6947-8-53. PMID:

  • 22

    WangJ.ZhaoR.ChengJ. (2022). Diagnostic accuracy of contrast-enhanced ultrasound to differentiate benign and Malignant breast lesions: A systematic review and meta-analysis. Eur. J. Radiol.149. doi: 10.1016/j.ejrad.2022.110219. PMID:

  • 23

    XiangH.WangX.XuM.ZhangY.ZengS.LiC.et al. (2023). Deep learning-assisted diagnosis of breast lesions on US images: A multivendor, multicenter study. Radiol. Artif. Intell.5, e220185. doi: 10.1148/ryai.220185. PMID:

  • 24

    XiongY.GuoW.LiangZ.WuL.YeG.LiangYet al. (2023). Deep learning-based diagnosis of osteoblastic bone metastases and bone islands in computed tomograph images: A multicenter diagnostic study. Eur. Radio.33. doi: 10.1007/s00330-023-09573-5. PMID:

  • 25

    XuH.YangA.KangM.LaiH.ZhouX.ChenZ.et al. (2025). Intratumoral and peritumoral radiomics signature based on DCE-MRI can distinguish between luminal and non-luminal breast cancer molecular subtypes. Sci. Rep.15, 14720. doi: 10.1038/s41598-025-98155-0. PMID:

  • 26

    ZhangY.-B.ChenZ.-Q.BuY.LeiP.YangW.ZhangW. (2024). Construction of a 2.5D deep learning model for predicting early postoperative recurrence of hepatocellular carcinoma using multi-view and multi-phase CT images. J. Hepatocell Carcinoma11, 2223–2239. doi: 10.2147/JHC.S493478. PMID:

  • 27

    ZhaoY.-X.LiuS.HuY.-B.GeY.-Y.LvD.-M. (2017). Diagnostic and prognostic values of contrast-enhanced ultrasound in breast cancer: A retrospective study. Onco Targets Ther.10, 1123–1129. doi: 10.2147/OTT.S124134. PMID:

  • 28

    ZhouS.LeJ.ZhouJ.HuangY.QianL.ChangC. (2020). The role of contrast-enhanced ultrasound in the diagnosis and pathologic response prediction in breast cancer: A meta-analysis and systematic review. Clin. Breast Cancer. 20, e490-e509. doi: 10.1016/j.clbc.2020.03.002. PMID:

Summary

Keywords

breast nodules, computer-aided diagnosis, contrast-enhanced ultrasonography, deep learning, malignancy prediction

Citation

Xie J, He Y, Zhu J, Liu C, Zhan J, Wang L, Zhang S, Qin W and Sun K (2026) A 2.5D multichannel deep learning model using contrast-enhanced ultrasound for predicting malignancy in breast nodules: a two-center study. Front. Physiol. 17:1820868. doi: 10.3389/fphys.2026.1820868

Received

02 March 2026

Revised

17 March 2026

Accepted

02 April 2026

Published

01 May 2026

Volume

17 - 2026

Edited by

Feng Gao, The Sixth Affiliated Hospital of Sun Yat-sen University, China

Reviewed by

Zekun Jiang, Sichuan University, China

Jun Gao, Stork Healthcare, China

Updates

Copyright

*Correspondence: Wenjian Qin, ; Kun Sun,

†These authors have contributed equally to this work and share first authorship

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics