Intra- and Peritumoral Radiomics Model Based on Early DCE-MRI for Preoperative Prediction of Molecular Subtypes in Invasive Ductal Breast Carcinoma: A Multitask Machine Learning Study

Purpose The aim of this study is to investigate radiomics features extracted from the optimal peritumoral region and the intratumoral area on the early phase of dynamic contrast-enhanced MRI (DCE-MRI) for predicting molecular subtypes of invasive ductal breast carcinoma (IDBC). Methods A total of 422 IDBC patients with immunohistochemical and fluorescence in situ hybridization results from two hospitals (Center 1: 327 cases, Center 2: 95 cases) who underwent preoperative DCE-MRI were retrospectively enrolled. After image preprocessing, radiomic features were extracted from the intratumoral area and four peritumoral regions on DCE-MRI from two centers, and selected the optimal peritumoral region. Based on the intratumoral, peritumoral radiomics features, and clinical–radiological characteristics, five radiomics models were constructed through support vector machine (SVM) in multiple classification tasks related to molecular subtypes and visualized by nomogram. The performance of radiomics models was evaluated by receiver operating characteristic curves, confusion matrix, calibration curves, and decision curve analysis. Results A 6-mm peritumoral size was defined the optimal peritumoral region in classification tasks of hormone receptor (HR)-positive vs others, triple-negative breast cancer (TNBC) vs others, and HR-positive vs human epidermal growth factor receptor 2 (HER2)-enriched vs TNBC, and 8 mm was applied in HER2-enriched vs others. The combined clinical–radiological and radiomics models in three binary classification tasks (HR-positive vs others, HER2-enriched vs others, TNBC vs others) obtained optimal performance with AUCs of 0.838, 0.848, and 0.930 in the training cohort, respectively; 0.827, 0.813, and 0.879 in the internal test cohort, respectively; and 0.791, 0.707, and 0.852 in the external test cohort, respectively. Conclusion Radiomics features in the intratumoral and peritumoral regions of IDBC on DCE-MRI had a potential to predict the HR-positive, HER2-enriched, and TNBC molecular subtypes preoperatively.


INTRODUCTION
Breast cancer is one of the most common malignant tumors, and it has been the leading cause of cancer death among women aged 20 to 59 years (1,2). The invasive ductal breast carcinoma (IDBC), accounting for approximately 80% in all breast cancers, is the most common histological subtype of breast cancer (3,4). As a highly heterogeneous malignant tumor, IDBC included four main molecular subtypes including Luminal A, Luminal B, epidermal growth factor receptor 2 (HER2)-enriched, and triple-negative breast cancer (TNBC) (5,6). More importantly, molecular subtypes of IDBC are closely correlated with treatment strategies, therapeutic effects, and clinical outcomes (7)(8)(9). Currently, IDBC molecular subtyping mainly relied on pathological immunohistochemistry (IHC) and gene expression profiling (10). However, both IHC and gene expression profiling are timeconsuming and depend on resection or biopsy specimens, in which an accurate diagnosis before surgery is difficult to make. (11) Therefore, there has been more attention on developing new and noninvasive strategies to preoperatively assess molecular subtypes for guiding clinical decisions. MRI is commonly used in clinic for preoperative evaluation of breast cancer. Dynamic contrast-enhanced MRI (DCE-MRI) has been reported to reflect more detailed biological information of tumor through analyzing tumorous hemodynamic features (12,13). Several studies have reported that texture features derived from DCE-MRI were correlated with diverse biomarker levels, such as estrogen receptor (ER), progesterone receptor (PR), and HER2 (14,15).
Radiomics enables subtle imaging feature extraction and quantification for exploring the underlying associations between the features and tumoral pathophysiology (16). Recently, several studies found that the radiomics features extracted from DCE-MRI are partially correlated with the tumoral heterogeneity or biological behavior, including molecular subtypes, Ki-67 expression, and therapeutic effect of neoadjuvant chemotherapy (15,17,18). However, most previous studies mainly focused on the intratumoral region while neglecting the areas surrounding the tumor containing the peritumoral information. The interactions between intratumoral cells and peritumoral elements influence tumor evolution and progression (19). For example, interactions between tumor cells and mesenchymal cells in the peritumoral area could induce the cytokine release and promote the tumor immunosuppressive microenvironment formation, leading to tumor progression (20). Peritumoral edema and angiogenesis are correlated with the malignant behavior of the tumor (21,22). Several studies have explored the radiomics features in the peritumoral areas based on various image types, such as mammography, ultrasound, and MRI, and had preliminary results in tumor diagnosis, biological indicator prediction, and prognosis evaluation (23)(24)(25). Li et al. recently reported that the radiomics features in the intra-/peritumoral regions based on DCE-MRI are able to identify the HER2 and Ki-67 status in breast cancer (26). In Li's study, the peritumoral region was set as 4 mm from the tumor boundary; however, the optimal peritumoral size for breast cancer has rarely been studied.
In this study, we first aimed to clarify the optimal peritumoral region in differentiating the IDBC molecular subtypes. Moreover, the radiomics features in the intratumoral and peritumoral areas based on DCE-MRI were generated, and it was verified whether radiomics analysis in DCE-MRI is helpful for the preoperative discrimination of IDBC molecular subtypes.

Patient Population
This study was approved by the institutional review boards from both participating hospitals, and the requirement for informed consent was waived. The data of 472 IDBC women from the First Affiliated Hospital of Bengbu Medical College (Center 1, from May 2016 to March 2021) and 141 IDBC women from Sir Run Run Shaw Hospital, Zhejiang University (Center 2, from April 2018 to August 2019) were retrospectively analyzed. The patient inclusion criteria were as follows: 1) underwent DCE-MRI scans less than 1 month before biopsy or resection; 2) no surgery or treatment before MRI scans; 3) all patients were pathologically diagnosed by surgical resection sample or needle biopsy; and 4) solitary tumor. Patient exclusion criteria were as follows: 1) more than a month between histopathological examination and MRI scans; 2) tumor size less than 1 cm; 3) patients with incomplete clinical, imaging, and pathological data; 4) inadequate image quality.
We finally included 422 women who met the criteria in this study ( Figure 1). In the binary classification analyses, 327 women from Center 1 were randomly divided into two datasets at a ratio of 7:3 (228 in the training cohort and 99 in the internal test cohort). A total of 95 women from Center 2 were allocated to the external test cohort. To avoid overfitting in the ternary classification analysis, given the difference in sample size, all patients from Center 1 were allocated to the training cohort, and patients from Center 2 comprised the test cohort.

Analysis of Molecular Subtypes
The expressions of ER, PR, and HER2 for each patient were recorded from IHC and fluorescence in situ hybridization (FISH) results. The molecular subtypes were diagnosed in accordance with the American Society of Clinical Oncology (ASCO)/College of American Pathologists (CAP) guidelines (27). When at least 1% of tumor cell nuclei exhibited positive staining for ER or PR, the samples were considered as ER-positive or PR-positive, respectively. For the expression status of HER2, a score of 3+ was defined as positive and a score of 0 or 1+ was considered negative. And in the case of HER2 scores between 1+ and 3+, samples will be further tested by FISH for a definite diagnosis. According to the expression status of ER, PR, and HER2, IDBC patients were subdivided into three molecular subtypes as HRpositive (ER+ and/or PR+, HER2+, or HER2-), HER2-enriched (ER-, PR-, and HER2+), and TNBC (ER-, PR-, and HER2-) (Supplementary Table S1). Three binary classification tasks (task 1 as HR-positive vs HR-negative; task 2 as HER2-enriched vs non-HER2-enriched; task 3 as TNBC vs non-TNBC) were performed for predicting the molecular subtypes. In addition, to further analyze the overall molecular subtype differences, a ternary classification task (task 4 as HR-positive vs HER2-enriched vs TNBC) was developed.

MRI Examinations
For all patients enrolled in this study, images were obtained respectively with two MRI scanners, a 3.0-T MRI scanner (Philips Achieva, Center 1) and a 1.5-T scanner (GE Signa HD, Center 2). All patients were positioned in the prone position and scanned with a bilateral dedicated breast coil. T2WI, T1WI, diffusion-weighted imaging (DWI), pre-contrast-enhancement T1WI, and DCE were sequentially acquired. Gd-DTPA (Magnevist) was used as the contrast agent in both centers. Precontrast-enhancement T1WI was obtained prior to contrast agent injection. For the DCE sequence, six phases (Center 1) and seven phases (Center 2) were acquired after the end of high-pressure syringe injection with 60 s (Center 1) and 70 s (Center 2) per phase, respectively. In both centers, Gd-DTPA was intravenously injected at a dose of 0.1 mmol/kg and at a flow rate of 2 ml/s, and 20-30 ml of saline flush was subsequently injected at the same flow rate. Detailed parameters are listed in Supplementary Table S2.

Radiomics Feature Processing
As previous studies suggested, the contrast between the mass of breast cancer and background reached a peak at 60-120 s after contrast injection (28,29). In this study, we selected the first phase of DCE-MRI from Center 1 and Center 2 to segment the region of interest (ROI). The normalization preprocessing of images is necessary before ROI segmentation since images are from two hospitals. Based on the slice thickness and pixel spacing, we performed image isotropic resampling and set new spacing to (1, 1, 1) to prevent image distortion caused by different scanners. The default window width and window level were read for each slice to get the average gray value of the whole  3D data, and the gray values of all slices were normalized to the range (1, 4097) using a linear transformation. After image preprocessing, intratumoral ROIs were annotated on a "DARWIN intelligent research platform" by two radiologists (Reader 1 SZ and Reader 2 XW) with more than five years of experience, and they were blinded to the patients' clinical, radiological, and pathological data. The intratumoral ROIs were manually delimitated by Reader 1 slice by slice on the first phase of axial DCE-MRI images. Considering that the lesion boundaries are sensitive to partial-volume effects on MRI, the boundaries of the intratumoral ROI is slightly smaller than the lesion boundary observed by human eyes (30). After 1 month, 10% of the images were randomly selected from Center 1 and Center 2, and the intratumoral ROIs of the selected images were segmented again by Reader 1 and Reader 2 to calculate the inter-/ intraclass correlation coefficient (ICC) for evaluating segmentation reproducibility. Since different peritumoral regions within 10 mm of the tumor may be closely associated with tumor vascular density, lymph node metastasis, and biomarkers, we set four peritumoral ranges of 2, 4, 6, and 8 mm, drawing on a previous peritumoral radiomics study (22,31). The 2-, 4-, 6-, and 8-mm peritumoral ROIs are ring-like regions obtained by automatically expanding the intratumoral ROIs slice by slice ( Figure 2). After segmentation, the threedimensional ROIs of the intratumoral and peritumoral regions were obtained. To further amplify the abundance of features, the intratumoral and peritumoral ROIs in the original image underwent six image-filtering processes, including squared filtering, square root filtering, exponential filtering, logarithm filtering, Laplacian of Gaussian (LOG) filtering, and wavelet filtering. Finally, a total of 1,316 radiomics features were automatically extracted over the original and derived filtered  To prevent the influence of the magnitude difference between features on feature selection, we normalized the feature magnitude before feature dimensionality reduction. All features were transformed to between (-1, 1) by using the maximum absolute value normalization method. In the first feature selection step, features were retained if both inter-/intraclass correlation coefficients > 0.75. Then, a statistical feature selection method named "Select K Best" with f_calssif function were used to further reduce feature dimensionality. The threshold for K was set to 20 while calculating the F statistics of ANOVA and retaining the features with F values in the top 20%. Finally, an embedded feature selection method, L1-regularized support vector machine (SVM-L1), was used to filter out the optimal radiomics features. More details about image filtering and feature normalization processing are given in the Supplementary Materials and Methods.

Clinical-Radiological Characteristics
Clinical characteristics including age, mass palpation (firmness and mobility), and menopausal status were obtained by reviewing the patient's clinical records. Radiological characteristics including tumor size, background parenchymal enhancement (BPE), fibro glandular tissue (FGT), margin sharpness, and short diameter of axillary lymph node (ALN) were evaluated mainly by two other radiologists (Reader 3 ZY with 4 years of experience in MRI diagnosis, and Reader 4 YZ with 6 years of experience). In cases of disagreement, Reader 5 (ZX) with 14 years of experience made the final decision. After univariate and multivariate logistic regression analyses, the selected characteristics were used to be the clinical-radiological independent predictors for predicting molecular subtypes in the four classification tasks. More details about the clinicalradiological characteristics processing are given in the Supplementary Materials and Methods.

Model Construction and Validation
An overview of the radiomics analysis pipeline is shown in Figure 3. SVM was used as a classifier to construct the intratumoral radiomics model (IRM) and 2-, 4-, 6-, and 8-mm peritumoral radiomics model (PRM) with the selected radiomics features, and Rad-scores for each model were calculated. In tasks 1-3, the receiver operating characteristic (ROC) curves were plotted to evaluate the performance of each PRM, and the area under the ROC curves (AUC) was performed to select the optimal PRM. In task 4, a confusion matrix was plotted to calculate the accuracy of PRM, and the optimal PRM was selected. Subsequently, the combined intra-and peritumoral radiomics model (CIPRM) was developed based on the intratumoral and optimal peritumoral Rad-score. Clinicalradiological independent predictors in the training cohort were used to establish clinical-radiological models (CMs) by logistic regression. Finally, we used the intratumoral Rad-score, the optimal peritumoral Rad-score, and clinical-radiological independent predictors to develop the combined clinicalradiological and radiomics models (CCRMs).
In tasks 1-3, the ROCs were plotted to assess the performance of each model, and DeLong test was used to select the optimal model and to evaluate the consistency of the model in the training cohort, the internal test cohort, and the external test cohort. Tenfold cross-validation was applied to evaluate model stability. Nomogram, calibration curve, and decision curve analysis (DCA) were depicted to visualize models, evaluate model fit, and analyze clinical usefulness, respectively. In task 4, a confusion matrix was used to describe the performance of ternary classification models.

Statistical Analysis
All statistical analyses were performed using the SPSS 22.0 software (SPSS, Chicago, IL) and R software (Version 4.1.0). Kolmogorov-Smirnov test was used to test the data normality. In binary classification tasks, independent-sample t-test, Mann-Whitney U test, and binary logistic regression analysis were used to assess the association between the molecular subtypes and features. And in the ternary classification task, ANOVA, Kruskal-Wallis test, and multinomial logistic regression analysis were used to analyze the association between the molecular subtypes and features. ICC > 0.75 was considered to indicate good reproducibility of segmentation. Delong test was used to compare the differences of ROCs. P < 0.05 was considered statistically significant.

Patient Characteristics
A total of 422 patients from Center 1 (327 cases, 49.61 ± 9.36 years) and Center 2 (95 cases, 51.55 ± 9.15 years) were included in this study. HR-positive is the most common subtype and comprised 69.4% of cases (293/422), and HER2-enriched was the least common accounting for 14.0% of cases (59/422). The difference of clinicalradiological and pathological characteristics in the training, internal, and external test cohorts is not statistically significant different. The detailed clinical-radiological and pathological characteristics are described in Table 1. And the clinical-radiological independent predictors in tasks 1-4 are shown in Supplementary Figure S1 and Supplementary Table S3. The logistic regression analysis showed that a short diameter of the axillary lymph node (ALN) (P < 0.001, OR = 0.264, 95% confidence interval [CI]: 0.146-0.479) and mass palpation (mobility) (P = 0.018, OR = 0.376, 95% CI: 0.168-0.843) were selected as independent predictors in task 1; margin sharpness (P = 0.025, OR = 3.146, 95% CI: 1.158-8.547) and short diameter of ALN (P = 0.004, OR = 2.374, 95% CI:1.312-4.298) were selected as independent predictors in task 2; the short diameter of ALN (P = 0.003, OR = 2.886, 95% CI:1.436-5.800) was selected as an independent predictor in task 3. For task 4, tumor size was selected as an independent predictor.

Radiomics Features
Out of the 1,316 extracted radiomics features from the intratumor, 82.8% (1,089 features) were retained when ICCs > 0.75. The peritumor retained the same 1,089 features as the intratumor. Based on the predicted labels for tasks 1-4, 217 features were retained by the f_calssif function. After being filtered by SVM-L1,  Characteristics a : The measurement data conforming to the normal distribution were expressed as mean ± standard deviation. Characteristics b : The non-normally distributed measurement data were expressed as median (lower quartile-upper quartile). Characteristics c : The enumeration data were expressed as frequency (constituent ratio); Statistic d : H-value; Statistic e : F-value; P is derived from univariate association analyses between each of the Patients Characteristics and Molecular subtypes, and P < 0.05 is considered statistically significant. ALN, axillary lymph node; BPE, background parenchymal enhancement; FGT fibro glandular tissue; HER2, human epidermal growth factor receptor 2; HR hormone receptor; TNBC triple-negative breast cancer.

Model Performance
In tasks 1-3, DeLong test showed that the models combined with peritumoral radiomics features exhibited higher performance compared to the individual models ( Table 3). The CCRMs yielded optimal performance, and ROC analyses showed that the AUCs were 0.838, 0.827, and 0.791 in training, internal test, and external test cohorts of task 1, respectively; 0.848, 0.813, and 0.707 in task 2, respectively; 0.930, 0.879, and 0.852 in task 3, respectively ( Table 4 and Figure 4). Internal and external tests proved that the models had satisfactory repeatability. After the 10-fold cross-validation, CCRMs performed excellent stability (Supplementary Figure S2). In addition, DeLong test showed that the CCRMs have good consistency in the training, internal, and external test cohorts of tasks 1-3 (P > 0.05, Table 5). The nomograms of the CCRMs showed that intratumoral and peritumoral Rad-scores were given higher weighting compared to the clinical-radiological independent predictors ( Figure 5).
The calibration curves illustrated that CCRMs are in excellent agreement with the ideal curve, and the DCA demonstrated that CCRMs have a high overall net benefit ( Figure 6). In task 4, CIPRM instead of CCRM had the highest accuracy (training cohort: 0.697; test cohort: 0.663) and the highest F1-score (training cohort: 0.79; test cohort: 0.76) from the confusion matrix among the five models (Figure 7 and Supplementary  Table S5, Supplementary Figure S3).

DISCUSSION
In this multitask radiomics analysis, we developed the noninvasive radiomics models based on DCE-MRI to preoperatively predict molecular subtypes of IDBC. The subtype classification results of both the individual models and        combined models showed reasonable distinguishing performance. And the CCRMs achieved excellent predictive performance for predicting molecular subtypes in the binary classification tasks. In contrast to an individual clinicalradiological model or a radiomics model, the combined model with optimized peritumoral radiomics features has higher predictive values for the molecular subtypes of IDBC. The microenvironment surrounding breast cancer has been reported to contain important biological information that may result in subtle changes on MRI images (20,22,32,33). However, the optimal peritumoral area size for breast cancer on DCE-MRI remains controversial. Wang et al. found that the radiomics features extracted from the 3-mm peritumoral region could differentiate benign and malignant breast lesions on contrast-enhanced mammography (25). Li et al. extracted intratumoral and 4-mm peritumoral radiomics features to predict the expression of HER2 and Ki-67 in breast cancer, and their models achieved good performance (26). In our study, 6 mm was determined as the optimal peritumoral size in tasks 1, 3, and 4, and 8 mm was determined as the optimal peritumoral size in task 2. In agreement with our findings, Ding et al. also analyzed radiomics features of multiple peritumoral areas, and they found that 4-, 6-, and 8-mm peritumoral radiomics features could further improve the performance of lymph node metastasis radiomics models in breast cancer (31). Our results demonstrate the excellent prediction performance of peritumoral radiomics features and suggest that the optimal peritumoral size should be selected based on the predictive label, which could further optimize the performance of the model.
The SVM is one of most frequently used machine-learning methods for solving classification problems, which is not susceptible to feature colinearity and is not prone to overfitting (34). In this study, we used the SVM to construct several highperformance and stable radiomics models. In tasks 1-3, CCRMs exhibited higher performance compared to other models. After internal test, external test, and 10-fold cross-validation, CCRMs showed satisfactory repeatability and stability. In the three binary classification tasks, the CCRM identifying TNBC vs non-TNBC exhibited the highest AUC. Our results are consistent with previous studies. Son et al. constructed three models for predicting molecular subtypes of breast cancer based on digital breast tomosynthesis (DBT), and they suggested that the radiomics features were superior for predicting TNBC over HER2 and luminal-like subtypes (AUC: 0.838, 0.556, and 0.645) (35). Doris et al. reported that radiomics features extracted from multiparametric MRI could noninvasively assess breast cancer molecular subtypes (Accuracy: 0.852, AUC: 0.860) (15). Compared to these results, the binary classification models developed in our study exhibit better differentiable capability, and this may be due to the fact that our study is based on 3D segmentation and optimized the peritumoral ROIs, which could obtain more comprehensive tumor information.
Comparatively, the model performance was not adequate in task 4 (training cohort: 0.697; test cohort: 0.663). The result is in agreement with a recent study by Huang et al. who built ternary classification MRI-based models to predict molecular subtypes of breast cancer (accuracy, 0.623-0.735) (36). Although we incorporated the optimal peritumoral features, the performance is still unsatisfactory in the ternary classification task. We argue that the molecular heterogeneity of the IDBC may lead to the diversification of radiomics features, which could be amplified in the ternary classification model resulting in reduced performance. Particularly, the optimal model (CIPRM) in task 4 did not incorporate a clinical-radiological feature, which was the result of the collinearity that exists between the radiomics shape feature maximum 2D diameter and the clinical-radiological feature tumor size.
Previous studies have reported that voxel size resampling and gray level normalization could reduce the variability of radiomics features (37,38). In our study, we validated the radiomics-based classification models for evaluating the generalization ability of the models by external test cohort from center 2. As different centers have different protocols for imaging, we utilized image resampling and normalization before ROI segmentation to help mitigate the effects of data heterogeneity. After pairwise comparisons of the training cohort, the internal test cohort, and the external test cohort, Delong test showed that the ROC of CCRM was not statistically significantly different between the three groups. This result may indicate that the difference of imaging protocols in the two centers had a negligible effect on the models constructed after image preprocessing. However, the external test cohort revealed relatively low AUC compared to other groups, which may be due to the fact that image preprocessing did not completely eliminate the differences of imaging protocols in the two centers, and the sample size of the external test cohort was relatively small.
Moreover, our results suggest that there might be unique radiomics features in intratumoral and peritumoral regions associated with microscopic morphological alterations in molecular subtypes. Among the radiomics features included in this study, wavelet features accounted for the highest proportion, which may suggest that the features from wavelet-filtered DCE-MRI potentially associated with IDBC molecular subtypes. In consistent with our results, Li et al. extracted wavelet features for predicting HER2 and Ki-67 expression status and suggested that wavelet features contain more detailed information about breast cancer (26). In addition, second-and higher-order texture features accounted for a higher proportion (56.34%) compared to shape features and first-order statistical features, suggesting that regional or local variation predominates over global variation in the distribution of different molecular subtypes (39).
Several limitations exist in this study. First, this is a retrospective study, leading to inevitable selection bias. Moreover, molecular subtypes are derived from IHC results at two hospitals but not from formal genetic testing, which might result in unstable results from the external test cohort. Third, in our study, the proportion of the peritumoral size was not adjusted for the size of the tumor itself. More detailed subgroup analysis according to different tumor sizes should be further studied to illustrate the wide applicability of the tumor outer margin. Finally, this study only constructed the radiomics models based on the first phase of DCE-MRI and did not analyze the whole DCE-MRI sequence, and further radiomics research on different phases of DCE-MRI is needed in the future.
In conclusion, this study further demonstrates the feasibility of preoperative radiomics analysis in predicting the molecular subtypes of IDBC. Combining the radiomics features of the intratumoral and the optimal peritumoral region from early DCE-MRI could effectively predict the HR-positive, HER2-enriched, and TNBC molecular subtypes of IDBC and potentially provide guidance for preoperative clinical decision-making.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
Written informed consent was not obtained from the individual (s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
Guarantor of integrity of the entire study: ZX and HS. Study concepts and design: SZ. Literature research: XW, ZY, and NZ.