Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Cell Dev. Biol., 26 November 2025

Sec. Molecular and Cellular Pathology

Volume 13 - 2025 | https://doi.org/10.3389/fcell.2025.1669651

Enhancing preoperative HER2 status classification of invasive breast cancers using machine learning models based on clinicopathological and MRI features: a multicenter study

Suhong Zhao&#x;Suhong Zhao1Zhaohua Li&#x;Zhaohua Li1Yanan WangYanan Wang2Fang ZhaoFang Zhao3Peipei ChenPeipei Chen1Guodong Pang
Guodong Pang1*
  • 1Department of Radiology, The Second Hospital of Shandong University, Jinan, China
  • 2Department of Radiology, Linglong Yingcheng Hospital, Yantai, China
  • 3Department of Radiology, Qilu Hospital of Shandong University, Jinan, China

Rationale and Objectives: The human epidermal growth factor receptor 2 (HER2) gene status is crucial for determining treatment efficacy. This study assessed preoperative HER2 classification in breast cancer using machine learning based on clinicopathological and MRI characteristics.

Materials and Methods: This retrospective study involved 1,015 patients (1,030 lesions) across two centers. Patients were divided into training, internal validation, and external validation sets. Nomograms were developed using clinicopathological and MRI features. Predictive models were constructed using decision trees (DT), support vector machines (SVM), k-nearest neighbors (k-NN), artificial neural networks (ANN), and multivariable logistic regression (LR). Model performance was evaluated using receiver operating characteristic curves, decision curve analysis, and calibration curves. Model interpretability was achieved by developing nomograms and employing SHAP (SHapley Additive exPlanations) analysis.

Results: Key variables for distinguishing HER2-positive from HER2-negative cases included regional N category, estrogen receptor, PR (progesterone receptor) status, Ki-67 status, lesion number, distribution quadrant, and accompanying signs. The SVM model achieved the highest AUC of 0.86 (95% confidence interval (CI): 0.81–0.90) in the training set, while the ANN model had an AUC of 0.77 (95% CI: 0.67–0.86) in the internal validation set. In the external validation set, the LR model achieved the highest AUC of 0.66 (95% CI: 0.56–0.76), although the overall performance was modest. For HER2-low versus HER2-zero differentiation, Ki-67 status, lesion number, distribution quadrant, mass shape, early enhancement rate, and ADC (apparent diffusion coefficient) were significant. The SVM model attained the highest AUC of 0.87 (95% CI: 0.83–0.91) in the training set, while the LR model demonstrated superior generalizability, yielding the highest AUCs in both the internal and external validation sets (internal: 0.67, 95% CI: 0.58–0.76; external: 0.74, 95% CI: 0.65–0.83). Radiologists benefited from the nomogram for improved diagnostic accuracy, especially junior radiologists. SHAP analysis revealed that PR status was paramount for HER2-positive classification, whereas mass shape and ADC values were dominant for identifying HER2-low status.

Conclusion: Integrating machine learning with clinicopathological and MRI characteristics improves the accuracy of HER2 status classification in breast cancer and enhances diagnostic capabilities for radiologists in clinical practice.

1 Introduction

Breast cancer is a highly heterogeneous disease with complex clinical and pathological manifestations (Brenner et al., 2020; Szymiczek et al., 2021). The treatment of breast cancer depends on the TNM stage and pathological characteristics, particularly the molecular subtype (Barzaman et al., 2020). The human epidermal growth factor receptor 2 (HER2) gene plays a crucial role in breast cancer, as it not only determines the molecular subtype but also directly influences treatment selection and efficacy (Barzaman et al., 2020). Traditionally, HER2 status categorizes tumors as HER2-positive or HER2-negative (Wolff et al., 2018). HER2-targeted therapies are effective in HER2-positive patients (Loiblm et al., 2022). However, the American Society of Clinical Oncology and American Pathological Society guidelines have redefined HER2-negative cases into two subcategories: HER2-low expression and HER2-zero expression (Tarantino et al., 2023). HER2-low breast cancer patients account for more than half of the traditional HER2-negative cohort (Marchio et al., 2021). Compared to HER2-positive and HER2-zero expression breast cancer, the prevalence of estrogen receptors (ER) and progesterone receptors (PR) is higher in HER2-low tumors, while Ki67 levels tend to be lower (Marchio et al., 2021; Zhang et al., 2022). HER2-low breast cancer may benefit from new therapeutic interventions, such as antibody-drug conjugates (Marchio et al., 2021; Modi et al., 2022; Xin et al., 2022; Zhang et al., 2022; Yang et al., 2023). Therefore, accurately determining HER2 expression status in breast cancer patients is crucial for identifying potential candidates for anti-HER2 therapies. Clinically, HER2 status is primarily assessed through biopsy; however, this invasive procedure may introduce errors due to tumor heterogeneity and variability in specimen quantity and quality (Chen et al., 2023). Thus, it is essential to develop non-invasive methods for predicting HER2 status.

Multiparameter breast magnetic resonance imaging (MRI) techniques, including T2-weighted imaging (T2WI), T1-weighted dynamic contrast-enhanced MR imaging (DCE-MRI), and diffusion-weighted imaging (DWI), non-invasively reflect tumor vascularity and cellular density, providing comprehensive lesion characterization (Yang et al., 2017; Xie et al., 2019; Xu et al., 2022). Artificial intelligence (AI) models, particularly radiomics, have been applied to breast MRI for tasks like benign-malignant differentiation and HER2 status prediction (Chen et al., 2020; Kim et al., 2021; Bian et al., 2023; Ramtohul et al., 2023; Zhang et al., 2023; Chen et al., 2024; Guo et al., 2024; Peng et al., 2024; Zheng et al., 2024). However, radiomics models often suffer from poor interpretability and require specialized software for feature extraction, limiting their clinical adoption. In contrast, using routinely available clinicopathological data and qualitative MRI features assessed by radiologists offers a more transparent and potentially more accessible approach. Zhou et al. (2025) recently used BI-RADS features with machine learning (ML) algorithms to classify HER2 status, suggesting its promise, but their study lacked external validation. Other approaches using synthetic MRI or DWI have been limited by small sample sizes (Zhan et al., 2024).

Therefore, this study aims to develop and validate a clinically practical framework for preoperative HER2 status classification by integrating readily available clinicopathological characteristics and conventional MRI features using ML. Our approach distinctively focuses on using features that are directly interpretable by clinicians, thereby enhancing the model’s transparency and potential for integration into routine workflows. We constructed predictive models and clinical nomograms to first differentiate HER2-positive from HER2-negative breast cancer, and then to distinguish the clinically critical HER2-low from HER2-zero subcategories within the HER2-negative group. Furthermore, we conducted an external validation to assess generalizability and evaluated whether the nomograms could enhance the diagnostic performance of radiologists with varying experience levels. This research may contribute to more precise, non-invasive HER2 stratification to guide clinical decision-making.

2 Materials and Methods

2.1 Ethics statement

The study conformed to the provisions of the Declaration of Helsinki and was approved by the Ethics Committee of the Second Hospital of Shandong University (Approval No: KYLL2025281; Date: 21 Feb 2025). Informed consent was waived for all patients due to the nature of the retrospective analysis.

2.2 Patients

This retrospective study involved two centers: the Second Hospital of Shandong University (referred to as “Center 1”) and the Qilu Hospital of Shandong University (referred to as “Center 2”). Consecutive breast cancer patients who underwent MRI at Center 1 (July 2020 to December 2023) and Center 2 (January 2023 to August 2023) were included. For the selection of lesions, bilateral breast lesions were measured separately, and among multiple lesions on the same side, the largest lesion was selected for measurement. Inclusion criteria: 1) breast cancer patients with HER2-positive, HER2-low, or HER2-zero expression confirmed by pathology, with complete clinicopathological data and no distant metastasis; 2) patients who underwent multiparameter MRI examination, including T2WI, DWI, and DCE-MRI. Exclusion criteria: 1) patients with incomplete clinicopathological data or distant metastasis; 2) patients with non-invasive breast cancer; 3) patients who underwent radiotherapy, chemotherapy, or breast biopsy before MRI examination; 4) patients with incomplete MRI image data or poor image quality; 5) patients exhibiting pure non-mass enhancement (NME) on DCE-MRI.

2.3 MRI examination

At Center 1, MRI examinations were performed using a 3.0T MRI imaging system (GE Discovery MR750, United States) with an 8-channel dedicated breast surface coil. The patient was positioned in the prone position, and both breasts were scanned simultaneously. The examination was conducted 7–14 days after menstruation. The scanning included T2 IDEAL, T1-weighted fast spin echo, DWI, and DCE-MRI sequences. The single-shot echo-planar imaging technology for transverse plane scanning was used in DWI. The b-values were set at 0 s/mm2 and 800 s/mm2, and the scanning parameters were as follows: repetition time (TR): 3,000 ms, echo time (TE): 49.5 ms, slice thickness: 5 mm, slice gap: 1.0 mm, field of view (FOV): 360 × 360 mm, matrix: 128 × 96, and number of excitations (NEX): 4. In DCE-MRI scanning, the Vibrant-Flex technology was used. Following the plain scan, the contrast agent gadodiamide (0.2 mmol/kg body weight) was injected at a rate of 2 mL/s, followed by a 20 mL saline flush. A mask scan was conducted before the administration of the contrast agent, and dynamic enhancement images were acquired immediately after the saline injection. A total of 7 sequences were obtained without intervals, each lasting 60 s. The parameters for these scans were as follows: TR: 3.9 ms, TE: 1.7 ms, flip angle: 5°, FOV: 360 × 360 mm, matrix: 348 × 348, slice thickness: 1.8 mm, and NEX: 0.7.

At Center 2, breast MRI scans were also performed using a 3.0T MRI system (GE SIGNA Architect 3.0T, United States) and an 8-channel dedicated breast surface coil. Patients were positioned in the prone position. The breast MRI protocol included several sequences: an axial pre-contrast 2D fast spin echo T2-weighted fat-suppressed sequence (TR: 4,000–6,000 ms, TE: 80–100 ms, matrix: 320 × 256, slice thickness: 3–4 mm, FOV: 36 × 36 cm, NEX: 2, and scan time: 120–180 s); an axial pre-contrast diffusion-weighted echo-planar imaging sequence (TR: 5,000–8,000 ms, TE: 60–80 ms, matrix: 128 × 128, slice thickness: 4–5 mm, FOV: 360 mm × 360 mm, NEX: 2–4, and scan time: 180–240 s); and an axial dynamic 3D spoiled gradient-echo T1-weighted fat-suppressed sequence (flip angle: 10°–12°, TR: 4–6 ms, TE: 1.5–2.5 ms, matrix: 512 × 512, slice thickness: 1–2 mm, FOV: 360 mm × 360 mm, and NEX: 1). Additionally, a sagittal 3D spoiled gradient-echo post-contrast T1-weighted sequence was performed. The DCE-MRI utilized axial imaging and included one pre-contrast and 5 post-contrast dynamic series. Contrast-enhanced axial images were captured at 1.5, 3, 4.5, and 6 min post-contrast injection, followed by a delayed sagittal image obtained 8 min after injection. A bolus of 0.1 mmol/kg of gadodiamide contrast agent was administered at a rate of 2 mL/s, accompanied by a 20 mL saline flush.

2.4 Image analysis

All MRI images were evaluated on the GE AW4.6 workstation by two radiologists with 8 and 26 years of experience, respectively, utilizing the Breast Imaging Reporting and Data System (BI-RADS) (Pinker et al., 2013). Both radiologists were fully blinded to all patient information, including clinical history, laboratory results, and pathological findings (e.g., HER2, ER, PR, and Ki-67 status). In cases of initial disagreement, a consensus decision was reached through a joint re-review of the images and direct discussion between the two radiologists. For early enhancement rate (EER) and time-signal intensity curve (TIC) evaluation, the region of interest (ROI) was analyzed using the T1 Perfusion software. During ROI delineation, areas exhibiting liquefied necrosis and hemorrhage were avoided. The ROI was defined in the most apparent and prominent area of enhancement and was smaller than the overall size of the lesion. The EER of TIC was calculated using the formula (SI_post - SI_pre)/SI_pre × 100%, where SI represents signal intensity. Here, SI_pre and SI_post denote the signal intensities before and after enhancement, respectively. The signal intensity measured 2 min after enhancement, or the highest peak signal intensity within 2 min, was referred to as SI_post. Delayed enhancement in TIC indicated enhancement occurring either 2 min post-enhancement or when the curve changes. TIC was categorized into three distinct types: progressive type (Type I, characterized by a consistent increase in signal intensity over time), platform type (Type II, where signal intensity remains constant after the initial enhancement), and washout type (Type III, in which signal intensity decreases after reaching peak enhancement). For apparent diffusion coefficient (ADC) measurement, a small ROI was selected in the darkest part of the lesion on the ADC map, while avoiding areas of liquefied necrosis, significant noise, or unenhanced regions (Baltzer et al., 2020). Multiple measurements were taken, ensuring consistent ROI size, and the average minimum ADC value for each lesion was recorded.

2.5 Assessment of histologic grade and tumor biomarkers in breast cancer

The histologic grade of tumor tissue was assessed using the Bloom-Richardson method (29), which evaluates tubular formation, nuclear pleomorphism, and mitotic count, each of which was assigned a score of 1, 2, or 3 points. Total scores of 3–5, 6–7, and 8–9 were classified as histological grades I, II, and III, respectively (Rakha et al., 2008). HER2 status was determined using immunohistochemistry (IHC) and in situ hybridization (ISH). HER2 status was categorized into HER2-positive (IHC score 3+ or IHC score 2+ with positive ISH amplification) and HER2-negative (IHC score 0 or 1+, or IHC score 2+ without ISH amplification) (Wolff et al., 2018; Tarantino et al., 2023). HER2-negative status was further classified into HER2-low (IHC score 1+ or IHC score 2+ without ISH amplification) and HER2-zero (IHC score 0). The criteria for evaluating ER and PR results were as follows: a positive percentage of ≥1% was considered positive, while a percentage of <1% was deemed negative. The hormone receptor (HR) status was considered positive if either ER or PR, or both, were positive (Allison et al., 2020). If both ER and PR were negative, HR was classified as negative. The Ki-67 index was measured based on the positive staining area, with ≥20% considered positive and <20% considered negative (Lee et al., 2023). Based on the ER, PR, HER2 status, and Ki-67 index, breast cancer was categorized into four molecular subtypes (Lawton, 2023): Luminal A (ER+, PR+, HER2-negative, and low Ki-67), Luminal B (ER+, PR-negative or low, HER2-negative, and high Ki-67), HER2-positive (ER-negative, PR-negative, and HER2-positive), and triple-negative (ER-negative, PR-negative, and HER2-negative).

2.6 Collection of clinicopathologic and MRI features

The clinical characteristics examined included patient age, menopausal status, regional N category, and primary T category. The pathological characteristics consisted of histological grade, HR status, ER status, PR status, Ki-67 status, and molecular subtype. Furthermore, a range of MRI features was observed and analyzed, including fibroglandular tissue component, background parenchymal enhancement, tumor diameter, tumor distribution, lesion location, distribution quadrant of lesions, signal on T2WI, intratumoral edema, peritumoral edema, lesion enhancement type, mass shape, mass margin, EER, TIC, ADC, increased vascularity or adjacent vessel sign, lymphadenectasis (axillary or internal mammary), and accompanying signs (such as nipple inversion, skin retraction, and pectoralis muscle invasion), and BI-RADS classifications.

2.7 Construction of ML predictive models

LASSO regression was used to select variables with non-zero coefficients for data dimensionality reduction and feature screening. Subsequently, a multivariable logistic regression model was constructed using the selected variables. Variables with p-values less than 0.1 in the multivariable logistic regression underwent further evaluation through stepwise regression for refined variable selection. Ultimately, the variables with p-values less than 0.05 in the stepwise regression were used for the construction of the final predictive models and the nomogram model. These selected variables were also used to train predictive models on the training set via five-fold cross-validation employing 5 ML algorithms: decision trees (DT), support vector machines (SVM), k-nearest neighbors (KNN), artificial neural networks (ANN), and multivariable logistic regression (LR). Each model was then applied to both the internal and external validation sets. The predictive performance of each model was assessed using receiver operating characteristic (ROC) curves. The area under the curve (AUC), sensitivity, and specificity were calculated. A calibration curve was used to visualize the degree of calibration of the predictive models. Furthermore, the clinical applicability of the models was evaluated through decision curve analysis (DCA). To enhance model interpretability, Shapley Additive exPlanations (SHAP) analysis was conducted to quantify the contribution of individual clinical, pathological, and imaging features to the model’s predictions.

2.8 Validation of predictive model by independent radiological assessment

To further validate the predictive value of the ML models, two additional radiologists with varying levels of seniority (Radiologist 1 with 18 years of experience and Radiologist 2 with 6 years of experience) were invited to assess the images of each patient in the external validation set. Both radiologists were blinded to pathological information. They classified each lesion as representing HER2-positive, HER2-low, or HER2-negative breast cancer. Each radiologist was required to reclassify the images with the assistance of the optimal model, which exhibited the highest AUC, to investigate the incremental benefit of the model for diagnostic radiologists.

2.9 Statistical analysis

Categorical variables are presented as counts and percentages and were compared with chi-square tests. Continuous variables are expressed as means and standard deviations or as median (interquartile range). The Kolmogorov-Smirnov test was applied to determine the normality of the distribution. If a variable was normally distributed, a t-test was utilized to compare group differences; if not, the Wilcoxon test was used to evaluate significant differences in medians between groups. To control the risk of false positives arising from multiple comparisons, the False Discovery Rate (FDR) correction was applied to the p-values from all univariate comparisons. Features with an FDR-adjusted p-value of ≤0.05 were considered statistically significant. The inter-observer agreement for MRI features was assessed using intraclass correlation coefficients (ICCs) for continuous variables and kappa coefficients for categorical variables. ICC values were classified as follows: less than 0.5, poor; between 0.5 and 0.75, moderate; between 0.75 and 0.9, good; greater than 0.90, excellent. Kappa coefficients were classified as follows: less than 0.00, poor; between 0.00 and 0.20, slight; between 0.21 and 0.40, fair; between 0.41 and 0.60, moderate; between 0.61 and 0.80, substantial; between 0.81 and 1.00, almost perfect. All statistical analyses were performed using R software (version 4.4.2).

3 Results

3.1 Clinicopathological characteristics of HER2-positive, HER2-negative, HER2-low and HER2-zero breast cancer patients

The flow chart of this study is illustrated in Figure 1. Consecutive breast cancer patients who underwent MRI at Center 1 (July 2020 to December 2023) and Center 2 (January 2023 to August 2023) were initially screened. A total of 57 patients exhibiting pure NME on DCE-MRI were excluded based on the predefined exclusion criterion, as the standard BI-RADS features for masses are not directly applicable to this enhancement type. After applying all other inclusion and exclusion criteria, a total of 861 patients (including 15 bilateral patients) with 876 lesions from Center 1 and 154 patients with 154 lesions from Center 2 were finally included. The patient lesions from Center 1 were allocated to the training set and internal validation set in a 7:3 ratio, while those from Center 2 served as the external validation set.

Figure 1
Flowchart illustrating the selection and classification of breast cancer patient data from two centers for a retrospective study. Patients were excluded based on reasons such as incomplete data, noninvasive cancer, prior treatment, and poor image quality. The final study groups include HER2-positive versus HER2-negative, and HER2-low versus HER2-zero, each further divided into training, internal validation, and external validation datasets with specific sample sizes mentioned for each category.

Figure 1. The study flow chart of patient enrollment.

For patients with HER2-positive and HER2-negative breast cancers, the training set consisted of 613 lesions (128 HER2-positive and 485 HER2-negative), the internal validation set comprised 263 lesions (41 HER2-positive and 222 HER2-negative), and the external validation set included 154 lesions (Figure 1). The clinicopathological characteristics of these HER2-positive and HER2-negative breast cancer patients are shown in Table 1. After FDR correction, histological grade, primary T category, HR status, ER status, PR status, Ki-67 status, and molecular subtype were statistically significant between HER2-positive and HER2-negative breast cancer patients in the training set (FDR-adjusted p-values ≤0.05). Compared with HER2-negative patients, HER2-positive patients were more likely to have a higher histological grade, negative ER, negative PR, and higher Ki-67 status. In both the internal and external validation sets, no differences were found in histological grade and tumor staging (FDR-adjusted p-values >0.05), while ER status and Ki-67 status were significantly different (FDR-adjusted p-values ≤0.05). Additionally, there were no statistical differences in patient age, menopausal status, or lymph node staging across the training, internal validation, and external validation sets (FDR-adjusted p-values >0.05).

Table 1
www.frontiersin.org

Table 1. Clinicopathological and MRI features of HER2-positive and HER2-negative breast cancer patients in the training, internal validation, and external validation sets.

For patients with HER2-low from HER2-zero breast cancers, there were 494 lesions (369 HER2-low and 125 HER2-zero) in the training set, 213 lesions (166 HER2-low and 47 HER2-zero) in the internal validation, and 122 lesions (93 HER2-low and 29 HER2-zero) in the external validation set (Figure 1). Table 2 summarized the clinicopathological characteristics of HER2-low and HER2-zero breast cancer patients. In the training set, ER status (FDR-adjusted p-value = 0.02) and molecular subtype (FDR-adjusted p-value = 0.005) were statistically significant between HER2-low and HER2-zero breast cancer patients. The proportion of ER was higher in HER2-low breast cancer. In the internal validation set, the HER2-low and HER2-zero breast cancer patients were significantly different in HR status, ER status, PR status, and Ki-67 status (FDR-adjusted p-values <0.05). Additionally, among the molecular subtypes, triple-negative breast cancer was more prevalent in HER2-zero patients, whereas the Luminal subtype was more common in HER2-low patients. Significant differences were observed in these molecular subtypes across the training, internal, and external validation sets (FDR-adjusted p-values <0.05). However, patient age, menopausal status, histological grade, lymph node staging, and tumor staging did not demonstrate statistical significance (FDR-adjusted p-values >0.05).

Table 2
www.frontiersin.org

Table 2. Clinicopathological and MRI features of HER2-low and HER2-zero breast cancer patients in the training, internal validation, and external validation sets.

3.2 MRI features of HER2-positive, HER2-negative, HER2-low, and HER2-zero breast cancer patients

The MRI characteristics of patients with HER2-positive and HER2-negative breast cancer were compared. Supplementary Table S1 summarizes the inter-observer agreement for the MRI features. Agreement for MRI features was good to excellent (ICC, 0.88–0.96). Agreement for categorical MRI features was substantial to almost perfect (kappa, 0.74–0.88). As shown in Table 1, there was no significant difference in fibroglandular tissue composition, background parenchymal enhancement, lesion location, distribution quadrant, T2WI signal, mass shape, mass margin, mass internal enhancement pattern, EER, TIC, and BI-RADS classification between HER2-positive and HER2-negative patients in the training set (FDR-adjusted p-values >0.05). The tumor diameter for HER2-positive patients was 2.45 (1.9, 3) cm, significantly higher than that for HER2-negative patients (2.1 (1.6, 2.9) cm) (FDR-adjusted p-value = 0.02). Multifocal lesions, intratumoral and peritumoral edema, ipsilateral vascularity increase, lymphadenectasis, and other accompanying signs were more prominent in HER2-positive patients (FDR-adjusted p-values <0.05). Furthermore, masses with NME were observed in 28.9% of HER2-positive patients and 12.8% of HER2-negative patients, with significant differences (FDR-adjusted p-value <0.001). The median ADC value in the HER2-positive group was 1.00 (0.89, 1.11) × 10−3 mm2/s, which was comparable to that in the HER2-negative group (0.96 (0.85, 1.09) × 10−3 mm2/s; FDR-adjusted p-value = 0.09). In the internal validation set, multifocal or multicentric lesions were found in 53.9% of HER2-positive patients and 30.1% of HER2-negative patients, with this difference being statistically significant (FDR-adjusted p-value = 0.02), while no statistically significant differences were found in other MRI features (FDR-adjusted p-values >0.05). In the external validation set, differences emerged in the lesion enhancement type between HER2-positive and HER2-negative patients (FDR-adjusted p-value = 0.05). Two case examples involving HER2-positive patients are illustrated in Supplementary Figures S1 S2. The Supplementary Figure S1 depicts a case of HER2 positivity characterized by peritumoral edema and multiple heterogeneously enhancing oval masses. The Supplementary Figure S2 presents another HER2-positive case featuring masses with NME.

The MRI characteristics of patients with HER2-low and HER2-zero breast cancer are presented in Table 2. In the training set, the number of lesions, mass shape, EER, and ADC were found to be related to HER2 status. Specifically, patients with HER2-low breast cancer were more likely to exhibit multifocality, compared to those with HER2-zero status (FDR-adjusted p-value = 0.05). While HER2-low breast cancer predominantly exhibited irregular mass shapes, HER2-zero tumors could present as irregular, round, or oval, with proportions being similar between the two groups. The EER for HER2-low and HER2-zero breast cancers were 0.65 (0.56, 0.75) and 0.60 (0.50, 0.67), respectively, indicating statistical significance (FDR-adjusted p-value <0.001). The mean minimum ADC was significantly lower in HER2-low breast cancer (0.94 (0.85, 1.06) × 10−3 mm2/s) than in HER2-negative breast cancer (1.00 (0.90, 1.12) × 10−3 mm2/s) (FDR-adjusted p-value = 0.01). In the internal validation set, the internal enhancement patterns between the two groups demonstrated statistical significance (FDR-adjusted p-value = 0.03). The mean minimum ADC value for HER2-low breast cancer was (0.99 ± 0.21) × 10−3 mm2/s, significantly lower than that for HER2-negative breast cancer ((1.1 ± 0.2) × 10−3 mm2/s) (FDR-adjusted p-value = 0.01). In the external validation set, significant differences were found in EER (FDR-adjusted p-value = 0.002) and mean minimum ADC values (FDR-adjusted p-value = 0.002) between HER2-low and HER2-zero breast cancers. The imaging findings of two cases with HER2-low and HER2-negative status are presented in the Supplementary Figures S3, S4. Each had a single lesion characterized by mass enhancement. Supplementary Figure S3 depicts a case of HER2-low with an irregular tumor, an early enhancement rate of 0.84, and an ADC value of 0.83 × 10−3 mm2/s. Supplementary Figure S4 presents a case of HER2-negative status showing peritumoral edema and a rounded tumor, with an early enhancement rate of 0.58 and an ADC value of 0.99 × 10−3 mm2/s.

3.3 Construction and performance evaluation of the models for differentiating HER2-positive from HER2-negative breast cancer

LASSO regression identified the variables of clinical regional N category, ER status, PR status, Ki-67 status, lesion number, distribution quadrant of lesions, and accompanying signs. Then, multivariable logistic and stepwise regressions were used to construct the predictive models and the nomogram (Figure 2A). Furthermore, a SHAP analysis was conducted on the training datasets to assess the significance of each feature within the nomogram model. As shown in Figure 2B, PR status emerged as the most influential feature, with positive status substantially increasing the prediction probability for HER2-positive classification. Accompanying signs also demonstrated a notable impact, while ER status contributed least to the model’s output. The performances of the ML models to differentiate between HER2-positive and HER2-negative breast cancer were evaluated using ROC curves. In the training set, the AUC was 0.70 (95% confidence interval (CI): 0.64–0.76) for DT, 0.74 (95% CI: 0.69–0.80) for k-NN, 0.82 (95% CI: 0.77–0.86) for ANN, and 0.80 (95% CI: 0.75–0.84) for LR (Figure 2C; Table 3). The model of SVM had the highest AUC of 0.86 (95% CI: 0.81–0.90), with a sensitivity of 0.81 (95% CI: 0.70–0.90), a specificity of 0.80 (95% CI: 0.74–0.89), and an accuracy of 0.80 (95% CI: 0.76–0.86) (Table 3). In the internal validation set, the ANN model demonstrated highest discriminative ability, with an AUC of 0.77 (95% CI: 0.67–0.86), a sensitivity of 0.57 (95% CI: 0.43–0.83), a specificity of 0.89 (95% CI: 0.68–0.99), and an accuracy of 0.82 (95% CI: 0.69–0.88) (Figure 2D; Table 3). The AUC, sensitivity, specificity, and accuracy of the LR model were 0.74 (95% CI: 0.64–0.84), 0.62 (95% CI: 0.43–0.9079), 0.84 (95% CI: 0.77–0.97), and 0.79 (95% CI: 0.73–0.88), respectively (Figure 2D; Table 3). Additionally, the LR model achieved the highest AUC in the external validation set (0.66 (95% CI: 0.56–0.76)), a sensitivity of 0.59 (95% CI: 0.50–1.00), a specificity of 0.69 (95% CI: 0.25–0.79), and an accuracy of 0.67 (95% CI: 0.40–0.75) (Figure 2E; Table 3), although the overall performance was modest. The calibration curve for the models exhibited good agreement between the predicted risks and the observed probabilities across all three sets (Figure 3). The DCA (Figure 4) revealed that the net benefits of the various models in predicting HER2-positive and HER2-negative breast cancer across the three sets were high, indicating that the ML models have good clinical utility and practical application potential. The DCA-derived optimal threshold probabilities for clinical decision-making varied by model and dataset, ranging from 0.31 to 1.00 for this classification task (Table 3). Notably, the results of the k-NN model were classified as either 0 or 1, rather than representing probabilities of event occurrence; therefore, there was no DCA curve or calibration curve analysis for the k-NN model.

Figure 2
Panel A shows a nomogram with variables like PR status and lesion number predicting event probability. Panel B is a SHAP summary plot indicating feature importance for lesion and sign attributes. Panel C through E display ROC curves for different models (DT, SVM, k-NN, ANN, LR) across training, internal, and external validation sets, with varying AUC values indicating model performance.

Figure 2. Model for differentiating HER2-positive from HER2-negative breast cancer. (A) Nomogram for predicting HER2-positive status, developed using the external validation set. (B) SHAP summary plot for the model differentiating HER2-positive from HER2-negative breast cancer. Each point represents a patient from the training set. The position on the x-axis shows the impact on the model output (SHAP value), and the color represents the feature value (yellow for high, purple for low). (C–E) ROC curves of the machine learning models in the (C) training, (D) internal validation, and (E) external validation sets.

Table 3
www.frontiersin.org

Table 3. Performance of 5 ML models in differentiating HER2-positive and HER2-negative breast cancer.

Figure 3
Four calibration curve charts show predicted versus actual probabilities for different models. Chart A (DT) shows varied results across datasets. Chart B (SVM) indicates convergence at lower probabilities. Chart C (ANN) demonstrates divergence in validation sets. Chart D (LR) exhibits close alignment to the ideal line. Each chart features training, internal, and external validation sets represented by different colored lines.

Figure 3. Model calibration for differentiating HER2-positive from HER2-negative breast cancer. Calibration curves for the (A) DT, (B) SVM, (C) ANN, and (D) LR models are shown for three sets. The dashed line represents ideal calibration.

Figure 4
Decision curve analysis (DCA) graphs show net benefit versus high risk threshold for four models: A) Decision Tree (DT), B) Support Vector Machine (SVM), C) Artificial Neural Network (ANN), and D) Logistic Regression (LR). Each graph compares external validation, training set, and internal validation set, with an

Figure 4. DCA curves for HER2-positive and HER2-negative breast cancer. The DCA curves of the (A) DT, (B) SVM, (C) ANN, and (D) LR models across all datasets. The black line represents the net benefit when no individuals receive the intervention (net benefit = 0), while the gray line represents the net benefit when all individuals receive the intervention.

3.4 Construction and performance evaluation of the models for differentiating HER2-low from HER2-zero breast cancer

The variables of Ki-67 status, lesion number, distribution quadrant of lesions, mass shape, EER, and ADC were used for the construction of the predictive models and the nomogram (Figure 5A). SHAP analysis provided critical insights into the model’s decision-making process (Figure 5B). Mass shape emerged as the most influential feature, with irregular morphology substantially increasing the prediction probability for HER2-low classification. ADC values also demonstrated a significant impact, showing an inverse relationship with HER2-low probability, while the distribution quadrant of lesions contributed least to the model’s predictions. The performance of ML models in predicting HER2-low and HER2-zero breast cancer is summarized in Table 4 and Figures 5C–E. The SVM model achieved the highest AUC of 0.87 (95% CI: 0.83–0.91) in the training set, compared to DT with an AUC of 0.68 (95% CI: 0.62–0.73), k-NN with an AUC of 0.85 (95% CI: 0.80–0.89), ANN with an AUC of 0.78 (95% CI: 0.74–0.82), and LR with an AUC of 0.76 (95% CI: 0.70–0.81) (Figure 5C; Table 4). Conversely, the LR model achieved the highest AUCs in both the internal and external validation sets, with values of 0.67 (95% CI: 0.58–0.76) and 0.74 (95% CI: 0.65–0.83), respectively (Figures 5D,E; Table 4). The calibration curve (Figure 6) and DCA (Figure 7) illustrated that the ML models were well-calibrated and provided substantial clinical net benefits in predicting HER2-low and HER2-zero breast cancer across all three sets. The optimal threshold probabilities from the DCA for distinguishing HER2-low from HER2-zero tumors are presented in Table 4.

Figure 5
Diagram showcasing a nomogram (A), SHAP summary plot (B), and ROC curves for different models. Panels C, D, and E display ROC curves for training, internal validation, and external validation sets, respectively. Colors represent different models: Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Artificial Neural Network (ANN), and Logistic Regression (LR). AUC values indicate model performance, with SVM and k-NN showing higher AUCs in the training set. SHAP plot (B) highlights feature importance, with mass shape and ADC being most significant. Radiologists' scores included in panel E for comparison.

Figure 5. Model for differentiating HER2-low from HER2-zero breast cancer. (A) Nomogram for predicting HER2-low status, developed using the external validation set. (B) SHAP summary plot for the model differentiating HER2-low from HER2-zero breast cancer. The position on the x-axis shows the impact on the model output (SHAP value), and the color represents the feature value (yellow for high, purple for low). (C–E) ROC curves of the machine learning models in the (C) training, (D) internal validation, and (E) external validation sets.

Table 4
www.frontiersin.org

Table 4. Performance of 5 ML models in differentiating HER2-low and HER2-zero breast cancer.

Figure 6
Calibration curves for four models: A) Decision Tree (DT), B) Support Vector Machine (SVM), C) Artificial Neural Network (ANN), D) Logistic Regression (LR). Each chart compares predicted probability with actual probability across training, internal validation, and external validation sets, represented by blue, red, and orange lines respectively. Dashed black lines indicate perfect calibration.

Figure 6. Calibration curves for HER2-low and HER2-zero breast cancer. Calibration curves of the (A) DT, (B) SVM, (C) ANN, and (D) LR models in three sets. The dashed line represents perfect calibration.

Figure 7
Four decision curve analysis charts labeled A to D, showing net benefit versus high risk threshold for different machine learning models. Each chart includes curves for external validation, training set, internal validation set, with lines marked

Figure 7. DCA curves for HER2-low and HER2-zero breast cancer. The DCA curves of the (A) DT, (B) SVM, (C) ANN, and (D) LR models across all datasets. The black line represents the net benefit when no individuals receive the intervention (net benefit = 0), while the gray line represents the net benefit when all individuals receive the intervention.

3.5 Enhanced diagnostic performance of radiologists with nomogram models

In the external validation set, the LR models demonstrated the highest performance in differentiating between HER2-positive and HER2-negative breast cancer or between HER2-low and HER2-zero breast cancer. Then, nomogram models were constructed based on these models (Figures 2A, 5A). We compared the nomogram’s performance with that of two radiologists, finding that the nomogram outperformed both radiologists (Figure 2E; Table 3). The use of the nomogram enhanced the sensitivity, specificity, and accuracy of both radiologists (Table 3). In the evaluation of HER2-positive and HER2-negative breast cancer, the senior radiologist (Radiologist 1) exhibited improvements in the AUC, sensitivity, specificity, and accuracy by 0.22, 0.07, 0.16, and 0.14, respectively (Figure 2E; Table 3). For the junior radiologist (Radiologist 2), the enhancements in AUC, sensitivity, specificity, and accuracy were 0.24, 0.13, 0.18, and 0.14, respectively (Table 3; Figure 2E). In the analysis of HER2-low or HER2-zero breast cancer, the utilization of the nomogram led to an AUC, sensitivity, specificity, and accuracy of 0.77 (95% CI: 0.68–0.86), 0.83 (95% CI: 0.66–0.97), 0.74 (95% CI: 0.66–0.83), and 0.76 (95% CI: 0.69–0.84) for Radiologist 2, and 0.81 (95% CI: 0.72–0.89), 0.86 (95% CI: 0.72–0.97), 0.77 (95% CI: 0.69–0.85), and 0.80 (95% CI: 0.72–0.86) for Radiologist 1, respectively (Table 4; Figure 5E). Therefore, the integration of nomogram models significantly improved diagnostic accuracy in differentiating HER2-positive from HER2-negative and HER2-low from HER2-zero breast cancer, surpassing the performance of radiologists and highlighting the potential for enhanced clinical decision-making.

4 Discussion

Traditionally, HER2-low tumors were classified as HER2-negative, which led to missed opportunities for patients to receive anti-HER2 therapies (Marchio et al., 2021; Modi et al., 2022; Xin et al., 2022; Zhang et al., 2022; Yang et al., 2023). Current research indicates that nearly 50% of patients with HER2-negative breast cancer are classified as HER2-low, thereby enabling them to benefit from novel anti-HER2 treatments (Marchio et al., 2021; Modi et al., 2022; Xin et al., 2022; Zhang et al., 2022; Yang et al., 2023). Although preoperative biopsy can provide histological information, the biopsy samples may capture only focal aspects of the tumor and may not adequately represent the entire tumor compared to surgical samples (Chen et al., 2023). Preoperative breast MRI, however, provides comprehensive three-dimensional data, which is widely used to classify various histological types and molecular markers (Zhou et al., 2021; Ramtohul et al., 2023; Guo et al., 2024). This study investigated the value of clinical factors and imaging features extracted from preoperative MRI examinations for predicting the HER2 expression levels in breast cancer patients. The results demonstrated that clinicopathological and MRI features may serve as independent predictors for differentiating between HER2-positive and HER2-negative tumors, as well as for distinguishing HER2-low from HER2-zero tumors.

Our results demonstrated remarkable performance across the training, internal, and external validation sets in predicting HER2 status. Several clinicopathological variables, namely histologic grade, ER status, PR status, HR status, and Ki-67 index among HER2-positive, HER2-low, and HER2-zero breast cancer patients, were significantly different. ER-negative and PR-negative statuses were more frequently observed in HER2-positive patients compared to those with HER2-negative breast cancer, alongside higher histological grades and increased Ki-67 levels. Higher HR, ER, and PR statuses were noted in HER2-low tumors compared to HER2-zero cancers, associated with lower histological grades and lower Ki-67 levels, which is consistent with previous reports (Won et al., 2022; Chen et al., 2024). This may be attributed to the role of HER2 as a proto-oncogene that inhibits apoptosis and promotes proliferation. Such events are closely associated with various biological behaviors, including tumor cell invasion and metastasis (Zhu et al., 2021). Moreover, we found that compared to HER2-negative cancers, HER2-positive cancers were larger and exhibited more aggressive imaging features, including multifocal lesions, intratumoral and peritumoral edema, ipsilateral vascularity increase, lymphadenectasis, and other accompanying signs, which is consistent with the results of Zhou et al. (2025). The proportions of tumors with NME in HER2-positive and HER2-negative breast cancer patients were 28.9% and 12.8%, respectively. This difference is primarily associated with the greater presence of intraductal components in HER2-positive breast cancer (Seyfettin et al., 2022). Patients with HER2-low breast cancer were more likely to present multifocality, irregular shape compared to patients with HER2-zero tumors. Significant statistical differences were observed in the EER and mean minimum ADC values between HER2-low and HER2-zero breast cancers. These results are consistent with our previous findings (Zhao et al., 2024).

In our study, multivariate analyses were conducted to select clinicopathological and MRI features for the development of ML models. For distinguishing HER2-negative from HER2-positive tumors, the SVM model achieved the highest AUC of 0.86 (95% CI: 0.81–0.90) in the training set, while the LR model recorded AUCs of 0.74 (95% CI: 0.64–0.84) and 0.66 (95% CI: 0.56–0.76) in the internal and external validation sets, respectively. It is noteworthy that the observed decrease in performance in the external validation set is a common phenomenon in multicenter studies. Despite this expected attenuation, the model maintained statistically significant performance (AUC >0.5). These models showed good calibration and high net benefits in predicting HER2-positive and HER2-negative breast cancer across the three sets. These results are comparable to those of Zhou et al. (2025). Discrimination between HER2-low and HER2-zero cancers is critical, as lower agreement and accuracy among pathologists have been noted when interpreting scanned slides of HER2 1+ and HER2 0 scores in biopsies (Fernandez et al., 2022). In this study, the SVM model achieved the highest AUC of 0.87 (95% CI: 0.83–0.91) in the training set for differentiating HER2-low from HER2-zero tumors. Notably, the LR model outperformed all other models, including the ANN, yielding AUCs of 0.67 (95% CI: 0.58–0.76) and 0.74 (95% CI: 0.65–0.83) in the internal and external validation sets, respectively. These models demonstrated excellent calibration and provided significant clinical net benefits in predicting HER2-low and HER2-negative breast cancer across all three cohorts. Our results were comparable to predictions made by the radiomics model using breast MRI (Zhang et al., 2021; Xu et al., 2022; Zhou et al., 2025). The successful external validation, despite a slight performance drop, underscores the model’s practical generalizability and facilitates its potential application in diverse clinical settings. Overall, integrating ML-derived qualitative and quantitative features into the routine workflow as a supportive diagnostic tool will enhance our evaluation of HER2 status across the entire tumor.

Recently, imaging analysis driven by AI has emerged as a robust approach for extracting numerous quantitative characteristics from tumors. A standard breast MRI protocol involves the acquisition of various types of images, including T1-weighted imaging, T2WI, diffusion-weighted imaging, and DCE-MRI, which provide substantial information to train AI models for classification tasks. The predominant technique is radiomics analysis, enabling the direct extraction of features from the delineated tumors (Chen et al., 2020; Kim et al., 2021; Bian et al., 2023; Ramtohul et al., 2023; Zhang et al., 2023; Chen et al., 2024; Guo et al., 2024; Peng et al., 2024; Zheng et al., 2024). Several recent studies have established multiple models for identifying HER2 statuses using MRI findings or classic radiomics, achieving relatively strong performance (Bian et al., 2023; Ramtohul et al., 2023; Chen et al., 2024; Guo et al., 2024; Peng et al., 2024; Zheng et al., 2024). However, a significant limitation of radiomics models is their lack of interpretability, which hinders their clinical application (Chen et al., 2020; Kim et al., 2021; Bian et al., 2023; Ramtohul et al., 2023; Zhang et al., 2023; Chen et al., 2024; Guo et al., 2024; Peng et al., 2024; Zheng et al., 2024). Clinicians often find it challenging to understand how these models reach their conclusions and to identify which radiomic features are critical in the decision-making process. Our study directly addresses this “black-box” concern. By employing SHAP analysis, we have quantified and visualized the contribution of each feature, thereby making the model’s decision-making process transparent. Clinicians can not only obtain a prediction but also understand why that prediction was made—for instance, seeing that an irregular mass shape and a low ADC value were the key drivers for classifying a case as HER2-low. Additionally, radiomics models require specialized software, potentially limiting their utility in routine clinical practice (Chen et al., 2020; Kim et al., 2021; Bian et al., 2023; Ramtohul et al., 2023; Zhang et al., 2023; Chen et al., 2024; Guo et al., 2024; Peng et al., 2024; Zheng et al., 2024). In contrast to radiomics models, our study only collected clinicopathological data from the electronic medical record system and obtained radiologic imaging features directly from daily imaging workstations, without the need for complex three-dimensional tumor target delineation at various phases of enhanced scanning, which is computationally intensive. Notably, the clinicopathological and MRI features identified in our study, derived from a larger sample size and encompassing a more diverse set of features that are likely more representative, demonstrated comparable or slightly improved performance relative to established findings (Chen et al., 2020; Kim et al., 2021; Bian et al., 2023; Ramtohul et al., 2023; Zhang et al., 2023; Chen et al., 2024; Guo et al., 2024; Peng et al., 2024; Zhan et al., 2024; Zheng et al., 2024; Zhou et al., 2025).

Radiologists typically rely on morphological characteristics, such as tumor diameter, mass shape, mass margin, lesion enhancement type, enhancement rate, and ADC value, to diagnose breast lesions (Pinker et al., 2013). To our knowledge, this is the first study in which radiologists evaluated HER2 expression levels using breast MRI images. However, diagnostic accuracy often depends on the radiologist’s professional experience. The interpretation of breast MRI images can vary significantly among different observers, particularly among less experienced radiologists (Zhang et al., 2023). In the external validation set, we compared the predictive performance of the nomogram with the visual assessments made by radiologists. The nomogram demonstrated significantly higher sensitivity than that of both radiologists. By utilizing the nomogram, both sensitivity and specificity, as well as overall diagnostic accuracy, improved for the two radiologists, with more notable enhancements observed for the junior radiologist. This suggests that the high-dimensional features captured more nuanced information from breast MRI than could be identified by the naked eye. Therefore, the nomogram constructed in our study may significantly enhance the diagnostic capabilities of radiologists, particularly for those who are junior. The superior performance of the nomogram prompts the question of its decision-making rationale. Our SHAP analysis demystifies this process by quantifying the contribution of each feature, thereby building the trust necessary for clinical adoption. For discriminating HER2-positive from HER2-negative tumors, the model appropriately prioritized PR status, reflecting the well-established mutual inhibition between HER2 and hormone receptor pathways. Conversely, when identifying the clinically challenging HER2-low subtype, imaging phenotypes—specifically, mass shape and ADC values—surpassed receptor status in importance. This suggests that HER2-low tumors may exhibit distinct morphological and cellular characteristics discernible via MRI, providing a novel, non-invasive perspective for their identification.

Beyond diagnostic accuracy, we explored the potential for future clinical utility through a preliminary, data-driven estimation derived from our retrospective cohort. The results revealed that in our external validation, the use of the nomogram was associated with a reduction in the average interpretation time for a junior radiologist in our dataset. Furthermore, applying a conservative prediction threshold (probability of HER2-negative >0.95) identified a subset of patients for whom the model output a very low probability of positive or low HER2 expression. While these findings point to directions for future research into workflow efficiency and patient stratification, they remain speculative. The model’s performance and any potential impact on clinical workflows or decision-making must be rigorously tested in prospective studies before any clinical application can be considered.

This study has several limitations. First, it employed a retrospective design, which may introduce potential biases in patient selection. Second, the analysis was limited to three datasets derived from only two medical centers. Investigating a larger volume of prospectively collected data from additional centers could enhance the robustness of findings in future research. Third, a notable performance drop was observed in the external validation set. This is an expected phenomenon often attributed to domain shift, such as differences in MRI scanners, acquisition protocols, and patient populations across institutions (Park and Han, 2018; Kelly et al., 2019). While the successful external validation, despite this attenuation, underscores the model’s practical utility, the performance drop highlights the challenge of cross-institutional generalization. Future work should incorporate strategies such as post-acquisition image harmonization (e.g., ComBat) or domain adaptation techniques to improve model robustness. Fourth, the study excluded pure NME lesions. This decision was made because the standard BI-RADS features used in our model (e.g., mass shape, margin) are not directly applicable to NME, which is characterized by different imaging patterns. While this allowed us to develop a robust model for mass-forming tumors, it introduces a selection bias and limits the generalizability of our findings, particularly as NME is more common in HER2-positive breast cancers. Future studies are warranted to develop specific criteria or models for preoperatively predicting HER2 status in pure NME lesions, which would provide a more comprehensive clinical tool.

In conclusion, this study highlights the promising potential of integrating ML algorithms with clinicopathological and preoperative MRI characteristics for the classification of HER2 status in breast cancer. Our findings demonstrate that these models can effectively differentiate between HER2-positive, HER2-low, and HER2-zero tumors, potentially facilitating more informed therapeutic decisions. The nomogram models, specifically, significantly enhanced the diagnostic accuracy of radiologists, particularly benefiting those with less experience. Given the critical role of accurate HER2 classification in guiding therapeutic strategies, we advocate for the incorporation of these nomograms into routine clinical practice. Future multicentric studies with larger patient cohorts are necessary to further validate these models and ensure their robustness across diverse clinical settings.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by the Ethics Committee of the Second Hospital of Shandong University (Approval No: KYLL2025281; Date: 21 Feb 2025). The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants’; legal guardians/next of kin because of the nature of the retrospective analysis.

Author contributions

SZ: Conceptualization, Data curation, Investigation, Resources, Writing – original draft, Writing – review and editing. ZL: Conceptualization, Data curation, Resources, Software, Writing – review and editing. YW: Investigation, Resources, Software, Writing – review and editing. FZ: Investigation, Writing – review and editing. PC: Formal Analysis, Writing – review and editing. GP: Conceptualization, Data curation, Project administration, Supervision, Writing – review and editing.

Funding

The authors declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcell.2025.1669651/full#supplementary-material

References

Allison, K. H., Hammond, M. E. H., Dowsett, M., McKernin, S. E., Carey, L. A., Fitzgibbons, P. L., et al. (2020). Estrogen and progesterone receptor testing in breast cancer: ASCO/CAP guideline update. J. Clin. Oncol. 38 (12), 1346–1366. doi:10.1200/JCO.19.02309

PubMed Abstract | CrossRef Full Text | Google Scholar

Baltzer, P., Mann, R. M., Iima, M., Sigmund, E. E., Clauser, P., Gilbert, F. J., et al. (2020). Diffusion-weighted imaging of the breast-a consensus and mission statement from the EUSOBI international breast diffusion-weighted imaging working group. Eur. Radiol. 30 (3), 1436–1450. doi:10.1007/s00330-019-06510-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Barzaman, K., Karami, J., Zarei, Z., Hosseinzadeh, A., Kazemi, M. H., Moradi-Kalbolandi, S., et al. (2020). Breast cancer: biology, biomarkers, and treatments. Int. Immunopharmacol. 84, 106535. doi:10.1016/j.intimp.2020.106535

PubMed Abstract | CrossRef Full Text | Google Scholar

Bian, X., Du, S., Yue, Z., Gao, S., Zhao, R., Huang, G., et al. (2023). Potential antihuman epidermal growth factor receptor 2 target therapy beneficiaries: the role of MRI-based radiomics in distinguishing human epidermal growth factor receptor 2-Low status of breast cancer. J. Magn. Reson Imaging 58 (5), 1603–1614. doi:10.1002/jmri.28628

PubMed Abstract | CrossRef Full Text | Google Scholar

Brenner, D. R., Weir, H. K., Demers, A. A., Ellison, L. F., Louzado, C., Shaw, A., et al. (2020). Projected estimates of cancer in Canada in 2020. CMAJ 192 (9), E199–E205. doi:10.1503/cmaj.191292

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Chen, X., Yang, J., Li, Y., Fan, W., and Yang, Z. (2020). Combining dynamic contrast-enhanced magnetic resonance imaging and apparent diffusion coefficient maps for a radiomics nomogram to predict pathological complete response to neoadjuvant chemotherapy in breast cancer patients. J. Comput. Assist. Tomogr. 44 (2), 275–283. doi:10.1097/RCT.0000000000000978

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, R., Qi, Y., Huang, Y., Liu, W., Yang, R., Zhao, X., et al. (2023). Diagnostic value of core needle biopsy for determining HER2 status in breast cancer, especially in the HER2-low population. Breast Cancer Res. Treat. 197 (1), 189–200. doi:10.1007/s10549-022-06781-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, H., Liu, Y., Zhao, J., Jia, X., Chai, F., Peng, Y., et al. (2024). Quantification of intratumoral heterogeneity using habitat-based MRI radiomics to identify HER2-positive, -low and -zero breast cancers: a multicenter study. Breast Cancer Res. 26 (1), 160. doi:10.1186/s13058-024-01921-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Fernandez, A. I., Liu, M., Bellizzi, A., Brock, J., Fadare, O., Hanley, K., et al. (2022). Examination of low ERBB2 protein expression in breast cancer tissue. JAMA Oncol. 8 (4), 1–4. doi:10.1001/jamaoncol.2021.7239

PubMed Abstract | CrossRef Full Text | Google Scholar

Guo, Y., Xie, X., Tang, W., Chen, S., Wang, M., Fan, Y., et al. (2024). Noninvasive identification of HER2-low-positive status by MRI-based deep learning radiomics predicts the disease-free survival of patients with breast cancer. Eur. Radiol. 34 (2), 899–913. doi:10.1007/s00330-023-09990-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G., and King, D. (2019). Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17 (1), 195. doi:10.1186/s12916-019-1426-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Kim, S. Y., Cho, N., Choi, Y., Lee, S. H., Ha, S. M., Kim, E. S., et al. (2021). Factors affecting pathologic complete response following neoadjuvant chemotherapy in breast cancer: development and validation of a predictive nomogram. Radiology 299 (2), 290–300. doi:10.1148/radiol.2021203871

PubMed Abstract | CrossRef Full Text | Google Scholar

Lawton, T. J. (2023). Update on the use of molecular subtyping in breast cancer. Adv. Anat. Pathol. 30 (6), 368–373. doi:10.1097/PAP.0000000000000416

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, J., Lee, Y. J., Bae, S. J., Baek, S. H., Kook, Y., Cha, Y. J., et al. (2023). Ki-67, 21-Gene recurrence score, endocrine resistance, and survival in patients with breast cancer. JAMA Netw. Open 6 (8), e2330961. doi:10.1001/jamanetworkopen.2023.30961

PubMed Abstract | CrossRef Full Text | Google Scholar

Loiblm, S., Jassem, J., Sonnenblick, A., Parlier, D., Winer, E., Bergh, J., et al. (2022). VP6-2022: adjuvant pertuzumab and trastuzumab in patients with early HER-2 positive breast cancer in APHINITY: 8.4 years' follow-up. Ann. Oncol. 33 (9), 986–987. doi:10.1016/j.annonc.2022.06.009

CrossRef Full Text | Google Scholar

Marchio, C., Annaratone, L., Marques, A., Casorzo, L., Berrino, E., and Sapino, A. (2021). Evolving concepts in HER2 evaluation in breast cancer: heterogeneity, HER2-low carcinomas and beyond. Semin. Cancer Biol. 72, 123–135. doi:10.1016/j.semcancer.2020.02.016

PubMed Abstract | CrossRef Full Text | Google Scholar

Modi, S., Jacot, W., Yamashita, T., Sohn, J., Vidal, M., Tokunaga, E., et al. (2022). Trastuzumab deruxtecan in previously treated HER2-Low advanced breast cancer. N. Engl. J. Med. 387 (1), 9–20. doi:10.1056/NEJMoa2203690

PubMed Abstract | CrossRef Full Text | Google Scholar

Park, S. H., and Han, K. (2018). Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 286 (3), 800–809. doi:10.1148/radiol.2017171920

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, Y., Zhang, X., Qiu, Y., Li, B., Yang, Z., Huang, J., et al. (2024). Development and validation of MRI radiomics models to differentiate HER2-Zero, -Low, and -Positive breast cancer. AJR Am. J. Roentgenol. 222 (4), e2330603. doi:10.2214/AJR.23.30603

PubMed Abstract | CrossRef Full Text | Google Scholar

Pinker, K., Bickel, H., Helbich, T. H., Gruber, S., Dubsky, P., Pluschnig, U., et al. (2013). Combined contrast-enhanced magnetic resonance and diffusion-weighted imaging reading adapted to the “Breast Imaging Reporting and Data System” for multiparametric 3-T imaging of breast lesions. Eur. Radiol. 23 (7), 1791–1802. doi:10.1007/s00330-013-2771-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Rakha, E. A., El-Sayed, M. E., Lee, A. H., Elston, C. W., Grainge, M. J., Hodi, Z., et al. (2008). Prognostic significance of Nottingham histologic grade in invasive breast carcinoma. J. Clin. Oncol. 26 (19), 3153–3158. doi:10.1200/JCO.2007.15.5986

PubMed Abstract | CrossRef Full Text | Google Scholar

Ramtohul, T., Djerroudi, L., Lissavalid, E., Nhy, C., Redon, L., Ikni, L., et al. (2023). Multiparametric MRI and radiomics for the prediction of HER2-Zero, -Low, and -Positive breast cancers. Radiology 308 (2), e222646. doi:10.1148/radiol.222646

PubMed Abstract | CrossRef Full Text | Google Scholar

Seyfettin, A., Dede, I., Hakverdi, S., Duzel Asig, B., Temiz, M., and Karazincir, S. (2022). MR imaging properties of breast cancer molecular subtypes. Eur. Rev. Med. Pharmacol. Sci. 26 (11), 3840–3848. doi:10.26355/eurrev_202206_28951

PubMed Abstract | CrossRef Full Text | Google Scholar

Szymiczek, A., Lone, A., and Akbari, M. R. (2021). Molecular intrinsic versus clinical subtyping in breast cancer: a comprehensive review. Clin. Genet. 99 (5), 613–637. doi:10.1111/cge.13900

PubMed Abstract | CrossRef Full Text | Google Scholar

Tarantino, P., Viale, G., Press, M. F., Hu, X., Penault-Llorca, F., Bardia, A., et al. (2023). ESMO expert consensus statements (ECS) on the definition, diagnosis, and management of HER2-low breast cancer. Ann. Oncol. 34 (8), 645–659. doi:10.1016/j.annonc.2023.05.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Wolff, A. C., Hammond, M. E. H., Allison, K. H., Harvey, B. E., Mangu, P. B., Bartlett, J. M. S., et al. (2018). Human epidermal growth factor receptor 2 testing in breast cancer: American society of Clinical Oncology/College of American pathologists Clinical Practice guideline focused update. J. Clin. Oncol. 36 (20), 2105–2122. doi:10.1200/JCO.2018.77.8738

PubMed Abstract | CrossRef Full Text | Google Scholar

Won, H. S., Ahn, J., Kim, Y., Kim, J. S., Song, J. Y., Kim, H. K., et al. (2022). Clinical significance of HER2-low expression in early breast cancer: a nationwide study from the Korean breast Cancer society. Breast Cancer Res. 24 (1), 22. doi:10.1186/s13058-022-01519-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Xie, T., Zhao, Q., Fu, C., Bai, Q., Zhou, X., Li, L., et al. (2019). Differentiation of triple-negative breast cancer from other subtypes through whole-tumor histogram analysis on multiparametric MR imaging. Eur. Radiol. 29 (5), 2535–2544. doi:10.1007/s00330-018-5804-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Xin, L., Wu, Q., Zhan, C., Qin, H., Xiang, H., Xu, L., et al. (2022). Multicenter study of the clinicopathological features and recurrence risk prediction model of early-stage breast cancer with low-positive human epidermal growth factor receptor 2 expression in China (chinese society of Breast surgery 021). Chin. Med. J. Engl. 135 (6), 697–706. doi:10.1097/CM9.0000000000002056

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, A., Chu, X., Zhang, S., Zheng, J., Shi, D., Lv, S., et al. (2022). Development and validation of a clinicoradiomic nomogram to assess the HER2 status of patients with invasive ductal carcinoma. BMC Cancer 22 (1), 872. doi:10.1186/s12885-022-09967-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, S. H., Lin, J., Lu, F., Han, Z. H., Fu, C. X., Lv, P., et al. (2017). Evaluation of antiangiogenic and antiproliferative effects of sorafenib by sequential histology and intravoxel incoherent motion diffusion-weighted imaging in an orthotopic hepatocellular carcinoma xenograft model. J. Magn. Reson Imaging 45 (1), 270–280. doi:10.1002/jmri.25344

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, C., Zhang, X., Chen, Y., Li, P., Zhang, J., Xu, A., et al. (2023). Survival differences between HER2-0 and HER2-low-expressing breast cancer - a meta-analysis of early breast cancer patients. Crit. Rev. Oncol. Hematol. 185, 103962. doi:10.1016/j.critrevonc.2023.103962

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhan, T., Dai, J., and Li, Y. (2024). Noninvasive identification of HER2-zero, -low, or -overexpressing breast cancers: multiparametric MRI-based quantitative characterization in predicting HER2-low status of breast cancer. Eur. J. Radiol. 177, 111573. doi:10.1016/j.ejrad.2024.111573

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Y., Chen, J. H., Lin, Y., Chan, S., Zhou, J., Chow, D., et al. (2021). Prediction of breast cancer molecular subtypes on DCE-MRI using convolutional neural network with transfer learning between two centers. Eur. Radiol. 31 (4), 2559–2567. doi:10.1007/s00330-020-07274-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, G., Ren, C., Li, C., Wang, Y., Chen, B., Wen, L., et al. (2022). Distinct clinical and somatic mutational features of breast tumors with high-low-or non-expressing human epidermal growth factor receptor 2 status. BMC Med. 20 (1), 142. doi:10.1186/s12916-022-02346-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, S., Shao, H., Li, W., Zhang, H., Lin, F., Zhang, Q., et al. (2023). Intra- and peritumoral radiomics for predicting malignant BiRADS category 4 breast lesions on contrast-enhanced spectral mammography: a multicenter study. Eur. Radiol. 33 (8), 5411–5422. doi:10.1007/s00330-023-09513-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhao, S., Chen, P., Wang, X., Zheng, Z., Hui, R., and Pang, G. (2024). Preoperatively predicting human epidermal growth factor receptor 2-low expression in breast cancer using neural network model based on multiparameter magnetic resonance imaging. Quant. Imaging Med. Surg. 14 (12), 8387–8401. doi:10.21037/qims-24-428

PubMed Abstract | CrossRef Full Text | Google Scholar

Zheng, S., Yang, Z., Du, G., Zhang, Y., Jiang, C., Xu, T., et al. (2024). Discrimination between HER2-overexpressing, -low-expressing, and -zero-expressing statuses in breast cancer using multiparametric MRI-based radiomics. Eur. Radiol. 34 (9), 6132–6144. doi:10.1007/s00330-024-10641-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, J., Tan, H., Li, W., Liu, Z., Wu, Y., Bai, Y., et al. (2021). Radiomics signatures based on multiparametric MRI for the preoperative prediction of the HER2 status of patients with breast cancer. Acad. Radiol. 28 (10), 1352–1360. doi:10.1016/j.acra.2020.05.040

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, J., Zhang, Y., Miao, H., Yoon, G. Y., Wang, J., Lin, Y., et al. (2025). Preoperative differentiation of HER2-Zero and HER2-Low from HER2-Positive invasive ductal breast cancers using BI-RADS MRI features and machine learning modeling. J. Magn. Reson Imaging 61 (2), 928–941. doi:10.1002/jmri.29447

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhu, C. R., Chen, K. Y., Li, P., Xia, Z. Y., and Wang, B. (2021). Accuracy of multiparametric MRI in distinguishing the breast malignant lesions from benign lesions: a meta-analysis. Acta Radiol. 62 (10), 1290–1297. doi:10.1177/0284185120963900

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: magnetic resonance imaging, clinicopathological features, breast cancer, humanepidermal growth factor 2, machine learning

Citation: Zhao S, Li Z, Wang Y, Zhao F, Chen P and Pang G (2025) Enhancing preoperative HER2 status classification of invasive breast cancers using machine learning models based on clinicopathological and MRI features: a multicenter study. Front. Cell Dev. Biol. 13:1669651. doi: 10.3389/fcell.2025.1669651

Received: 20 July 2025; Accepted: 10 November 2025;
Published: 26 November 2025.

Edited by:

Or Kakhlon, Hadassah Medical Center, Israel

Reviewed by:

Sarbjeet Makkar, University of Michigan, United States
Mohan Jayatilake, University of Peradeniya, Sri Lanka

Copyright © 2025 Zhao, Li, Wang, Zhao, Chen and Pang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Guodong Pang, cGdkMjI2QGFsaXl1bi5jb20=

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.