Radiomic Analysis of Contrast-Enhanced Mammography With Different Image Types: Classification of Breast Lesions

Objective: A limited number of studies have focused on the radiomic analysis of contrast-enhanced mammography (CEM). We aimed to construct several radiomics-based models of CEM for classifying benign and malignant breast lesions. Materials and Methods: The retrospective, double-center study included women who underwent CEM between November 2013 and February 2020. Radiomic analysis was performed using high-energy (HE), low-energy (LE), and dual-energy subtraction (DES) images from CEM. Datasets were randomly divided into the training and testing sets at a ratio of 7:3. The maximum relevance minimum redundancy (mRMR) method and least absolute shrinkage and selection operator (LASSO) logistic regression were used to select the radiomic features and construct the best classification models. The performances of the models were assessed by the area under the receiver operating characteristic curve (AUC) with a 95% confidence interval (CI). Leave-group-out cross-validation (LGOCV) for 100 rounds was performed to obtain the mean AUCs, which were compared by the Wilcoxon rank-sum test and the Kruskal–Wallis rank-sum test. Results: A total of 192 women with 226 breast lesions (101 benign; 125 malignant) were enrolled. The median age was 48 years (range, 22–70 years). For the classification of breast lesions, the AUCs of the best models were 0.931 (95% CI: 0.873–0.989) for HE, 0.897 (95% CI: 0.807–0.981) for LE, 0.882 (95% CI: 0.825–0.987) for DES images and 0.960 (95% CI: 0.910–0.998) for all of the CEM images in the testing set. According to LGOCV, the models constructed with the HE images and all of the CEM images showed the highest mean AUCs for the training (0.931 and 0.938, respectively; P < 0.05 for both) and testing sets (0.892 and 0.889, respectively; P = 0.55 for both), which were significantly higher than those of the two models constructed with the LE and DES images in the training (0.912 and 0.899, respectively; all P < 0.05) and testing sets (0.866 and 0.862, respectively; all P < 0.05). Conclusions: Radiomic analysis of CEM images was valuable for classifying benign and malignant breast lesions. The use of HE images or all three types of CEM images can achieve the best performance.


INTRODUCTION
Among women, breast cancer is the most commonly diagnosed cancer worldwide and the leading cause of cancer death in 103 countries (1). In the face of such a disease burden, precise, and rapid diagnosis is crucial in clinical practice.
Mammography is commonly utilized for screening and diagnostic use with detection of breast lesions. However, its sensitivity can be as low as 30-50% in women with dense breasts (2)(3)(4). Breast magnetic resonance imaging (MRI) is a state-ofthe-art technique with the highest sensitivity to detect breast cancer (5,6). However, the false-positive findings (7-10), lengthy examination time, high cost, and lack of accessibility for all patients (11,12) are limitations of MRI.
Under such circumstances, contrast-enhanced mammography (CEM) has emerged (13). This methodology can demonstrate the morphological and angiogenic characteristics of breast lesions after the injection of iodine-based contrast material (14). CEM allows for obtaining three types of images for each craniocaudal (CC) and mediolateral oblique (MLO) view, including high-energy (HE), low-energy (LE), and dual-energy subtraction (DES) images (Figure 1). LE and DES images are used for clinical diagnosis, among which the former is considered to be equivalent to routine mammography (15,16), and the latter can highlight areas of contrast enhancement (17).
In recent years, radiomics has been developing rapidly. It utilizes high-throughput computing to extract large numbers of image features and converts images into quantifiable data (18)(19)(20). Since CEM images can reflect both morphological and functional features of the lesions, such as MRI, and can have high spatial resolution comparable with that of mammography, we suspect that CEM would also have an encouraging application in the field of radiomics. Several studies have performed some preliminary work in this aspect (21)(22)(23)(24)(25), but the number of such studies is rather limited. Furthermore, it is worth noting that no studies thus far have used HE images in radiomic analysis since this type of image is not used for clinical diagnosis. Instead, they used LE images, DES images, or a combination of the two in their research. Although HE images are not used for clinical diagnosis (17), the subtle features of HE images may be mined with the help of radiomics. Therefore, we suppose that radiomic analysis of HE, Abbreviations: AUC, area under the receiver operating characteristic curve. BI-RADS, breast imaging reporting and data system; CC, craniocaudal; CEM, contrast-enhanced mammography; CI, confidence interval; DES, dual-energy subtraction; DICOM, digital imaging and communications in medicine; HE, highenergy; LASSO, least absolute shrinkage and selection operator; LE, low-energy; LGOCV, leave-group-out cross-validation; MLO, mediolateral oblique; MRI, magnetic resonance imaging; mRMR, maximum relevance minimum redundancy; ROC, receiver operating characteristic; ROI, region of interest. LE, and DES images may contribute to the diagnosis of breast lesions. The purpose of this study is to construct radiomics-based models and to identify the model that can better classify breast lesions, which may be helpful for radiologists in decision-making.

Study Participants
This is a retrospective, double-center study. The Institutional Review Board and Ethics Committee of each center approved this study. The patient written informed consent was waived. We collected consecutive CEM images from the two institutions between November 2013 and February 2020. No study cohorts have been previously reported.
The inclusion criteria were as follows: (1) patients with suspected breast lesions after physical examination or ultrasound and referred for CEM as part of diagnostic imaging, (2) patients who completed CEM examinations, and (3) patients with a final diagnosis that was confirmed by histopathology results within 2 weeks after CEM examination. We first excluded patients: (1) lacking medical history, (2) with missing or incomplete image data, and (3) with a history of breast surgery, breast radiotherapy, chemotherapy, or hormone treatment within 1 year prior to CEM examination. After preliminary evaluation of all the images, we further excluded patients (1) with images with poor image quality and (2) with no lesions detected on either HE, LE, or DES images. The flowchart of the patient inclusion and exclusion criteria is shown in Figure 2.

CEM Image Interpretation by Human Readers
Two radiologists (NM and RL) with 8-10 years of experience in breast imaging and 5 years of experience interpreting CEM images reviewed the medical histories and the CEM images of all the potential participants and selected the eligible ones based on the inclusion and exclusion criteria. Several lesion characteristics were obtained by the radiologists according to the Breast Imaging Reporting and Data System (BI-RADS) lexicons for mammography and MRI (26). Breast density (a, b, c, or d) and lesion type (mass, calcification, architectural distortion, or asymmetry) characteristics were obtained from LE images. Degree of enhancement (no, mild, moderate, or marked enhancement), type of enhancement (focal, mass, or non-mass), and degree of background parenchymal enhancement (minimal, mild, moderate, or marked) characteristics were obtained from DES images.

Reference Standard
The histopathological results obtained by biopsy or surgical specimens within 2 weeks after CEM examination are regarded as the reference standard for the classification of breast lesions in this study. None of the patients had undergone any form of treatment for the suspected lesions before the specimens were obtained. Lesions containing any invasive component or ductal carcinoma in situ are considered malignant; otherwise, the lesions are considered benign.

CEM Examination
The examination protocols adopted by the two institutions were the same. All CEM examinations were performed using Senographe Essential mammography units (GE Healthcare, Buc, France). Before the examination, each patient rested for a while, and an intravenous catheter needle was placed in the antecubital fossa vein. A dose of 1.5 ml/kg body weight iodinated contrast material (iohexol, 300-350 mg I/ml; Beilu Pharmaceutical Co., Ltd., Beijing, China) was injected intravenously using an automated power injector at a flow rate of 3.0 ml/s, followed by a 10-ml bolus of saline. Two min after the injection of the contrast material, dual-energy exposures were performed using a fully automated exposure control method depending on the breast density and thickness. Bilateral CC views were obtained first, beginning with the suspicious breast. Then, bilateral MLO views were acquired in the same order. In a single projection, LE (peak tube voltage: 26-31 kVp) and HE (peak tube voltage: 45-49 kVp) exposures were performed continuously within 1.5 s to reduce motion artifacts. A proprietary algorithm was used to automatically reconstruct the HE and LE images to generate the DES images with the digital mammography unit. The total examination time of each patient since the injection of contrast material did not exceed 10 min. No contrast material-related adverse reactions were found in this study.

Lesion Segmentation
For radiomic analysis, all of the CEM images were stored in the format of Digital Imaging and Communications in Medicine (DICOM) and loaded into an open-source image processing platform ITK-SNAP (version 3.6; www.itksnap.org) (27). Two radiologists (SW and QL) with 3-5 years of experience in breast imaging and 1 year of experience interpreting CEM images manually delineated the regions of interest (ROIs) together along the boundary of the lesions. Both of them were blinded to the patients' medical histories and histopathological results. A month later, they randomly selected 30 patients and resegmented the lesions to assess consistency for manual segmentation.
The criteria of lesion segmentation were as follows: (1) all HE images were transformed into negative films by the ITK-SNAP software for ROI delineation (Figures 3A-C); (2) for all patients, contours were separately delineated on HE, LE, and DES images of CC and MLO views if the lesions were visible on each image; if not, contours were delineated on either HE, LE, or DES images depending on which provided the preferable visualization of the lesion. Then, these contours were mapped onto other images, ensuring six ROIs for each lesion; (3) for lesions such as microcalcification, asymmetry, or architectural distortion without corresponding mass in LE images, closed loops were delineated along the edge of the lesions (Figures 3D,E); (4) multiple non-adjacent lesions were delineated separately and regarded as different lesions. The radiologists delineated all the suspicious lesions they had identified; (5) after lesion segmentation, another radiologist (TJ) who was not blinded to the histopathological results with 8 years of experience in breast imaging and 5 years of experience interpreting CEM images reviewed all the ROIs and deleted the ones without corresponding histopathological results. Therefore, only the pathologically proven lesions were retained in the following analysis.

Feature Extraction
Before radiomic feature extraction, image preprocessing, including image resampling and gray level discretization, was performed. All voxel sizes of all images were resampled with the same size of 0.2 × 0.2 mm. Gray-level discretization was performed to discretize all the images to 256 gray levels. Then, the ROIs and the matched raw data were integrated into the Analysis Kit software (version 3.2.0; GE Healthcare) to extract the radiomic features. For each ROI, a feature dataset consisting of 392 radiomic features (including 42 histogram features, 5 shape features, and 345 textural features) was obtained (Supplementary Table 1

Feature Selection and Radiomics Model Construction
The radiomics classification model was used to differentiate malignant from benign lesions. The workflow of the study is shown in Figure 4, and the general structure of the radiomics model is shown in Figure 5.
Before constructing the models, each dataset was randomly divided into training and testing sets (training vs. testing set; 7:3) using the stratified random sampling method. The training set was further divided into the training and validation subsets to perform 10-fold cross-validation. For radiomic feature selection, we performed a three-step procedure. First, for the assessment of consistency for manual segmentation, the interobserver agreement test was conducted to choose the features that were not sensitive to the variation of manual segmentation. The features with interclass correlation coefficients <0.75 were kept in the datasets and used for constructing the classification models. Second, the maximum relevance minimum redundancy (mRMR) method (28) was performed to select the most relevant and least redundant radiomic features. Twenty features were retained for subsequent analyses. Third, least absolute shrinkage and selection operator (LASSO) logistic regression was used to find the most predictive subsets of features and to construct the corresponding classification models. As a sparse penalized aggression approach, LASSO regression (29, 30) has many desirable properties for regression models with a large number of covariates (31,32) and can reduce variability and improve model accuracy. The penalty parameters of the models were tuned through 10-fold cross-validation, thus yielding the best classification model. The testing set was independently used to evaluate the performance of the established model without being involved in model construction or parameter tuning. The performances of the models were evaluated in terms of area under the receiver operating characteristic (ROC) curve (AUC) value with a 95%  confidence interval (CI). The accuracy, sensitivity, and specificity of the models were also calculated by selecting an optimal threshold based on Youden's Index.
Considering the variation and sampling bias due to the random split of the datasets, we employed a nested crossvalidation method by further conducting 100 rounds of leavegroup-out cross-validation (LGOCV) to obtain 100 AUCs and sensitivity and specificity values, which are shown as means ± standard deviations. To compare the mean AUC values of different models, non-parametric tests were adopted, including the Wilcoxon rank-sum test for comparisons between two groups and the Kruskal-Wallis rank-sum test for analysis of variance.

Statistical Analysis
All statistical analyses were performed using the statistical software R (version 3.6.1; www.r-project.org). The LASSO logistic regression and ROC curve analyses were conducted using the glmnet and the pROC software packages. Student's t-test was used to compare between the benign and malignant groups for the continuous variables, and Chi-square test or Fisher's Exact test was used for the categorical variables, as appropriate. The false discovery rate correction was also performed for multiple comparison. A two-sided P-value of less than 0.05 was considered statistically significant.

Study Population and Lesion Characteristics
The patient and lesion characteristics are given in Table 1

Performances of the Best Classification Models
The performances of the best classification models of different types of CEM images are shown in Table 2 and Figure 6. The selected radiomic features and their corresponding coefficients are provided in Supplementary Figure 1.
In both the training and testing sets, the performances of the models generated by any type of CEM images ( In terms of sensitivity, in both the training and the testing sets, Model 1 and Model 4 still ranked first (sensitivity = 91.9% for both in the testing set), followed by Model 3 (sensitivity = 83.8% in the testing set). Model 2 constructed by LE images showed the lowest sensitivity in both the training and testing sets (sensitivity = 81.1% in the testing set). In terms of specificity in the testing set, Model 2 ranked first (specificity = 90.0%), followed by Model 4 (specificity = 86.7%), and both Model 1 and Model 3 had specificity values of 83.3%. All of the models showed similar trends in the training and testing sets in terms of AUC, accuracy, sensitivity, and specificity.

LGOCV Analysis
After 100 rounds of LGOCV, the obtained mean values of the AUC, sensitivity, and specificity are displayed in Table 3.
Consistent with the performances of the abovementioned best models, the mean AUCs generated by any type of CEM images ( Table 3: Models 1-3) were still good, with all mean AUCs <0.862 in both the training and testing sets. In both the training and testing sets, the differences of mean AUCs of all the models ( Table 3: Models 1-4) were statistically significant (Figure 7, both P < 0.01). In the testing set, Model 1 constructed by HE images and Model 4 constructed by the combination of HE, LE, and DES images still achieved the highest levels of AUCs after 100 rounds of LGOCV (mean AUC = 0.892 ± 0.040   for Model 1; mean AUC = 0.889 ± 0.038 for Model 4; P = 0.55), followed by the other two models (mean AUC = 0.866 ± 0.045 for Model 2; mean AUC = 0.862 ± 0.039 for Model 3; P = 0.31). The mean AUCs of Model 1 and Model 4 were significantly higher than those of Model 2 and Model 3 (all P < 0.01) in the testing set, which was consistent with the results of the best classification models without conducting LGOCV analysis ( Table 2).
In the testing set, Model 1 and Model 4 still reached high levels of mean sensitivity (mean sensitivity = 88.7% for Model 1; mean sensitivity = 88.3%), followed by Model 3 (mean sensitivity = 83.2%) and Model 2 (mean sensitivity = 80.1%). In terms of specificity, all of the models have reached good mean specificity levels ranging from 85.9 to 88.7% in the training set and from 81.8 to 85.0% in the testing set. It is worth noting that the mean specificity value of Model 2 constructed by LE images still ranked first in both the training and testing sets (mean specificity = 88.7% for the training set; mean specificity = 85.0% for the testing set), which was in line with the results of the best classification models (  (Supplementary Figure 2). We also provided several heatmaps showing the median AUCs of all the models constructed with the 12 datasets to make the results more intuitive (Supplementary Figure 3). The median AUCs are essentially in parallel with the mean AUCs of the corresponding models.

DISCUSSION
Our study has proposed a feasible radiomic analysis method for CEM images for the differentiation of benign and malignant breast lesions. The findings in our study have shown that the models constructed with any type of CEM images show good performances, among which the model constructed with HE images performed the best. When the model employs the radiomic features of all three types of images, it can always achieve fairly satisfactory results with a high level of robustness. The result suggests that all CEM images contribute to the diagnosis of breast lesions to some extent, probably because they can reflect diverse image characteristics containing complementary information. Importantly, although HE images are currently thought to be clinically uninterpretable, they may contain useful information as original images without being postprocessed and may be valuable in the field of radiomics. Furthermore, we found that the radiomic analysis for DES images alone is not as ideal as those of the others. This finding is similar  to another study (22) in which DES images were considered to have lost some heterogeneity information due to the digital subtraction process.
To the best of our knowledge, this is the first study to fully evaluate the diagnostic performances of all types of CEM images with use of radiomics. In addition, the numbers of benign and malignant cases are relatively balanced, which may reduce the potential classification bias toward the majority of cases and the consequent overfitting problem. Some previous studies have employed undersampling (21) or oversampling (22) techniques under these circumstances, but we attempted to avoid the problem from the origin.
In our study, we have defined the method of lesion segmentation in detail by converting HE images into negative images and mapping the optimal ROIs from one type of image to the other type. In practice, we found that after converting HE images into negative films and adjusting the window level and window width, the outline of the lesion became clearer and could be delineated effectively. Additionally, most of the patients in our research had dense breasts, which are common in Asian women; therefore, it was difficult to accurately segment the lesions in LE images in some cases. Thus, the method of mapping the optimal ROIs among different types of CEM images is advisable and sometimes even necessary in women with extremely dense breasts. One previous study concluded that by mapping the optimal lesion segmentation from DES images onto LE images, the classification performance can be significantly improved (22). Some studies used the radiomic features of either CC or MLO view images (21,33), or both of them (23,24), whereas another study (22) used the mean value of two feature values separately computed from CC and MLO view images to represent the final feature value. Instead, we used the radiomic features of both CC and MLO view images, hoping to make the best use of the image information.
In addition to the 10-fold cross-validation, we further conducted 100 rounds of LGOCV to validate the performances of the models. The AUCs of different models were relatively stable since they showed essentially the same trend before and after the LGOCV step. Since the results of radiomic analysis can be affected by the data to some extent, this cross-validation method can minimize the case partition bias.
Two meta-analyses reported the pooled specificity of CEM in the diagnosis of breast cancer to be 58-84% (34,35), which denotes the discrepancies for specificity between studies and leaves room for further improvement in the diagnostic accuracy. The results of our study have shown the potential to improve the specificity, with the highest mean specificity value <84% ( Table 3: Model 2). If our results are further substantiated in future prospective studies, the invasive biopsies of benign lesions may probably be reduced by the help of non-invasive radiomic analysis of CEM images. However, it seems that the HE and DES images do not contribute greatly to the improvement of the specificity obtained with the LE images, which still needs further exploration.
Our study had the following limitations. First, we mixed the data from two research centers to train and test the models rather than using the data from Center 2 for independent external testing; furthermore, all of the CEM examinations were performed on the same type of equipment. These factors limit the extrapolation of our conclusions. Second, since this is a retrospective study, some image data, especially HE images, were missing, which led to a smaller sample size than expected. Further prospective research with a larger sample size is warranted. Third, we manually delineated the contours of the lesions, which may affect the repeatability of the research. Fourth, the sensitivities in our study were not as high as the ones reported by human readers (34,35), which may need further improvement in future studies. Finally, the dataset in our study was enriched for malignant lesions, thus likely overestimating the models' performances to some extent.
In summary, we proposed a radiomics-based method to classify benign and malignant breast lesions using CEM images and found that all of the HE, LE, and DES images of CEM can provide valuable information in the process, among which HE images seem to perform better than the others. It is recommended that all CEM images should be used in radiomic analysis to obtain the most satisfactory and stable performance in breast lesion classification.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of Shanghai Cancer Center and the Ethics Committee of Yantai Yuhuangding hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.