Diagnosis of Breast Cancer Using Radiomics Models Built Based on Dynamic Contrast Enhanced MRI Combined With Mammography

Objective To build radiomics models using features extracted from DCE-MRI and mammography for diagnosis of breast cancer. Materials and Methods 266 patients receiving MRI and mammography, who had well-enhanced lesions on MRI and histologically confirmed diagnosis were analyzed. Training dataset had 146 malignant and 56 benign, and testing dataset had 48 malignant and 18 benign lesions. Fuzzy-C-means clustering algorithm was used to segment the enhanced lesion on subtraction MRI maps. Two radiologists manually outlined the corresponding lesion on mammography by consensus, with the guidance of MRI maximum intensity projection. Features were extracted using PyRadiomics from three DCE-MRI parametric maps, and from the lesion and a 2-cm bandshell margin on mammography. The support vector machine (SVM) was applied for feature selection and model building, using 5 datasets: DCE-MRI, mammography lesion-ROI, mammography margin-ROI, mammography lesion+margin, and all combined. Results In the training dataset evaluated using 10-fold cross-validation, the diagnostic accuracy of the individual model was 83.2% for DCE-MRI, 75.7% for mammography lesion, 64.4% for mammography margin, and 77.2% for lesion+margin. When all features were combined, the accuracy was improved to 89.6%. By adding mammography features to MRI, the specificity was significantly improved from 69.6% (39/56) to 82.1% (46/56), p<0.01. When the developed models were applied to the independent testing dataset, the accuracy was 78.8% for DCE-MRI and 83.3% for combined MRI+Mammography. Conclusion The radiomics model built from the combined MRI and mammography has the potential to provide a machine learning-based diagnostic tool and decrease the false positive diagnosis of contrast-enhanced benign lesions on MRI.


INTRODUCTION
Breast cancer is the most common cancer in women, and one main cause of cancer deaths (1,2). Mammography, ultrasound, and magnetic resonance imaging (MRI) are well-established diagnostic modalities, which are known to reveal different aspects of underlying abnormalities and provide complementary information for diagnosis (3,4). Dynamic contrast-enhanced MRI (DCE-MRI) can assess angiogenesis (5,6), which is essential for cancer development and progression (7,8). The high spatial resolution and 3D imaging capability of MRI allow for detecting early small cancers, and for evaluating the extent of the disease for pre-operative staging and treatment planning. However, some benign diseases may show strong contrast enhancements and lead to a false positive diagnosis (9).
Mammography can detect breast cancer based on the presence of mass, microcalcifications, architectural distortion, or asymmetric density. It is a widely used imaging modality for screening and diagnosis, and crucial for detecting breast cancer at an early, curable, stage to decrease mortality (10). However, mammography is limited by breast density, which may compromise the detection sensitivity. For women with a highrisk of developing breast cancer, the screening is recommended to start from a young age, and to mitigate the problem of high density in mammography MRI is commonly used as a supplementary modality. Since different imaging can evaluate different pathological characteristics of the abnormal tissue, combining them may improve the diagnostic accuracy (3). MRI is also commonly used for problem-solving when other imaging shows equivocal findings. For example, in patients with category 4 mammographic microcalcifications, MRI can decrease false positive findings and unnecessary biopsy (11).
Breast Imaging Reporting and Data System (BI-RADS) (12) is used to indicate the level of suspicion in detected abnormality. However, subjective reading using the BI-RADS lexicon only achieved moderate levels of inter-reader agreement (13). For MRI, intra-/inter-observer agreement was particularly worse for non-mass enhancement compared to mass lesions (14,15). To circumvent this problem, computer-aided diagnosis (CAD) systems have been proposed to develop quantitative models that are not subject to high variations to serve as potential diagnostic tools (16,17).
Artificial intelligence (AI) based radiomics study has been widely applied for medical applications. The method allows for high-throughput extraction of quantitative features from radiographic images (18), and it has been shown as a feasible approach for diagnosis of breast cancer using mammography (19)(20)(21)(22) and MRI (23)(24)(25). However, the combined model using different imaging modalities was rarely reported. Features from corresponding lesions on each modality can be extracted, and then combined in the selection process to develop better models based on their complementary information.
The purpose of this study was to evaluate the diagnostic performance of radiomics models built based on DCE-MRI and mammography. The motivation was coming from the high false positive diagnosis of contrast-enhanced benign lesions commonly seen on MRI. It is anticipated that the complementary information provided by the radiomics analysis of the lesion on mammography may help to improve the diagnostic accuracy. In mammography, features extracted from the lesion and the margin were used to build separate models. The complementary role of MRI and mammography was first evaluated by the selected features, and then by comparing the performance of final models built using each modality alone and in combination.

Study Population
This retrospective study was approved by Institutional Review Board and written informed consent was waived. Earlier patients who received DCE-MRI and mammography for diagnosis between July 2017 and August 2019 and had confirmed pathology were retrospectively identified as the training set. Later patients from September 2019 to July 2020 were used as the independent testing set. The exclusion criteria were: (1) no pathology result; (2) not visible on MRI or mammography; (3) having prior surgery, chemotherapy, or other treatment; (4) the interval between the two examinations longer than one month; (5) poor image quality. Finally, a total of 268 lesions were included, 202 lesions (146 malignant and 56 benign) in the training set, and 66 lesions (48 malignant and 18 benign) in the testing set. The BI-RADS scores of MRI and mammography were obtained from the radiology reports, classified into 2, 3, 4A, 4B, 4C, and 5. In our institution, BI-RADS 4 MRI cases were routinely subdivided to 4A, 4B, and 4C, as validated in Strigel et al. (26).

Image Acquisition
Mammography was performed using Fujifilm Amulet Innovality Digital Mammography System with a resolution of 5828×4728 pixels, including craniocaudal (CC) and mediolateral oblique (MLO) view. MRI was performed on a 3.0T scanner (GE SIGNA HDx) using a dedicated 8-channel bilateral breast coil. The imaging protocol included axial and sagittal T2-and T1weighted sequences, and the DCE acquisition performed using the volume imaging for breast assessment (VIBRANT) sequence. The parameters were: repetition time= 5msec, echo time= 2msec, flip angle= 10°, slice thickness= 1.2mm, field of view= 34×34cm 2 , matrix size= 416×416, temporal resolution= 90sec, and total scan time= 9min. The DCE series consisted of 6 frames: one pre-contrast and 5 post-contrast. The contrast agent, 0.1 mmol/kg body weight of gadopentetate dimeglumine Abbreviations: AI, artificial intelligence; AUC, the area under the curve; BI-RADS, Breast Imaging Report and Data System; CAD, computer-aided diagnosis; CC, cranio-caudal; DCE, dynamic contrast enhanced; DCIS, ductal carcinoma insitu; GLCM, gray-level co-occurrence matrix; GLDM, gray-level dependence matrix; GLRLM, gray-level run length matrix; GLSZM, gray-level size zone matrix; IDC, invasive ductal cancer; MIP, maximum intensity projection; MLO, medio-lateral oblique; MRI, magnetic resonance imaging; NGTDM, neighboring gray tone difference matrix; NPV, negative predicting value; PPV, positive predicting value; ROC, receiver operating characteristic; ROI, region of interest; SE, signal enhancement; SVM, support vector machine. (Magnevist; Bayer Schering Pharma), was injected after the precontrast images were acquired, with a flow rate of 2 mL/s followed by a flush of 20 mL saline.

Tumor Segmentation
For MRI, the tumor region of interest (ROI) segmentation was done using computer algorithms, according to the location and the range of slices. The fuzzy-C-means clustering algorithm was applied to perform segmentation on each DCE slice containing the lesion. The automatic segmentation results were evaluated by two radiologists separately, and adjusted if necessary. Then, the ROIs from all slices were combined, and the 3D connectedcomponent labeling and the hole-filling algorithms were applied to generate the final 3D mask (27,28). For the corresponding mammography, two radiologists manually outlined the lesion on craniocaudal (CC) or mediolateral oblique (MLO) view by consensus using ITK-SNAP software (version 3.8, www. itksnap.org), with the guidance of the lesion shown on the maximum intensity projection (MIP) of MRI, projected from different angles. The choice of CC or MLO was determined according to the lesion visibility, and only one view was used.

MRI and Mammography Radiomics Feature Extraction
The analysis flowchart is demonstrated in Figure 1. (29). On each parametric map, a total of 107 features were extracted, including 14 shape, 18 first-order, 24 gray-level cooccurrence matrix (GLCM), 14 gray-level dependence matrix (GLDM), 16 gray-level run length matrix (GLRLM), 16 graylevel size zone matrix (GLSZM), and 5 neighboring gray tone difference matrix (NGTDM) features, so there was a total of 321 parameters from 3 maps. Only 268 features showing intra-class coefficient (ICC) ≥0.8 were included in the final analysis, which was determined using two sets of separately segmented tumor ROI to evaluate the reproducibility of extracted radiomics features (30). For mammography, two different feature sets were analyzed. Considering that the ROI was manually drawn by tracing the visible lesion area based on density, it might not reveal the margin information. To specifically focus on the margin, a 2-cm bandshell was created, by shrinking and expanding the manually-drawn tumor boundary by 1 cm, as shown in Figure 1. Because the margin could not be well defined on mammography, shrinking the boundary followed by region growing has been shown as a feasible segmentation method (31), and the method was adopted here to generate the bandshell for analysis of margin features. Similarly, the intensity was normalized to mean=0 and standard deviation=1, and a total of 107 PyRadiomics features were extracted from the outlined lesion mask and also from the bandshell on mammography. The radiomics model was first performed using lesion features alone, margin features alone, and then a combined model was built by considering all lesion and margin features.

Feature Selection and Model Building in Training Set
The procedures are also shown in Figure 1. In addition to the normalization on images, each feature extracted from all cases was normalized to mean=0 and standard deviation=1 before training. To evaluate the importance of these features in FIGURE 1 | The analysis flowchart. The training and testing sets are assembled according to the time of case enrollment. The analysis starts with ROI segmentation, followed by radiomics feature extraction using Pyradiomics, feature selection and model building in the training set using SVM with cross-validation, and lastly, the testing of the 5 developed models in the testing set. diagnosis, a sequential forward feature selection method using the support vector machine (SVM) was applied (32,33). In this process, we used SVM with Gaussian kernel as the objective function to test the performance of models built with a subset of features. In the beginning, an empty candidate set was presented, and features were sequentially added. The 10-fold crossvalidation was applied to test the model performance. In each iteration, the training process was repeated 1,000 times to explore the robustness of each feature. After each iteration, the feature which led to the best performance was added to the candidate set. When the addition of features no longer met the criterion, the selection process stopped. Here, we used 10*e^-6 as termination tolerance for the objective function value.
The selected features were used to build the SVM classification model with Gaussian kernel to classify the benign and malignant groups. The diagnostic performance was tested using 10-fold cross-validation. Each case had only one chance to be included in the validation set. The probability of all cases in  the validation set was combined to perform the receiver operating characteristic curve (ROC) analysis, and the area under the curve (AUC) was calculated. Five models were built using features extracted from: 1) DCE-MRI; 2) mammographylesion ROI; 3) mammographymargin ROI, i.e., the bandshell; 4) mammography lesion+margin; and 5) all combined. The developed model gave a radiomics score, i.e., the malignancy probability, for each case.

Applying the Trained Models to the Testing Set
The developed models from the training set were applied to test their performances in the testing set. The model gave each lesion a radiomics score, and they were used to generate the ROC curves. The sensitivity, specificity, positive predicting value (PPV), negative predicting value (NPV), and overall accuracy of each model were calculated using the threshold of probability ≥0.5 as malignant.   The Delong test was used to compare the difference between paired ROC curves. The difference in proportions between malignant and benign groups was compared by using the Chi-square (c 2 ) test or Fisher's Exact Test.

Patients' Characteristics and BI-RADS Scores
In the training set, the mean age was 50.0 ± 9.6 in the malignant, and 46.6 ± 9.7 in the benign groups.

Radiomics Diagnostic Models in Training Set
The selected radiomics features for each model are listed in Table 2. The diagnostic sensitivity, specificity, PPV, NPV, accuracy, and AUC obtained from the cross-validation results are summarized in Table 3. The overall accuracy was 83.2% for DCE-MRI. In mammography, the accuracy was 75.7% for lesion-ROI, 64.4% for margin-ROI, and when combining both of them it was improved to 77.2%. When all MRI and mammography features were combined to build a model, the accuracy was improved to 89.6%, which was significantly better than the mammography model (77.2%, p=0.001). The combined model was also better than the MRI model (83.2%, p=0.059), but not reaching significance. By adding mammography features to MRI, the specificity was significantly improved from 69.6% (39/56) to 82.1% (46/56) (p<0.01), while sensitivity was also improved from 88.4% (129/146) to 92.5% (135/146). Figure 6 Figure 2 is an IDC with BI-RADS 5 MRI and BI-RADS 4C mammography, and the malignancy probability predicted by MRI, mammography, and combined models are: 0.83, 0.77, 0.88, respectively; thus, true positive. Figure 3 is a DCIS, also with BI-RADS 5 MRI and a lower BI-RADS 4A mammography, and the combined radiomics probability is 0.62, true positive. Figure 4 is a very small 0.7 cm benign adenosis with BI-RADS 3 MRI and BI-RADS 2 mammography, and the combined radiomics probability is 0.15, true negative. Figure 5 is another adenosis in a younger woman with BI-RADS 4A MRI and BI-RADS 2 mammography, and the combined radiomics probability is 0.11, true negative. These cases demonstrate that the malignancy probability predicted by radiomics models was consistent with BI-RADS reading, and elaborate how the model may help to improve the diagnostic confidence.

Performance of the Trained Models in Testing Set
The developed models were then applied to cases in the independent testing set to test the performance. The results are  Table 3. In general, the performance of these 5 models was consistent with the validation results in the training set. The accuracy was 78.8% for DCE-MRI, 69.7% for mammography, and improved to 83.3% when using the combined MRI and mammography model.

Performance of the Combined Model in Each BI-RADS Category
In order to further evaluate the performance of the model in each BI-RADS category, the results from the training and testing sets are combined and listed in

DISCUSSION
In this study, we developed the radiomics models for diagnosis of breast cancer using DCE-MRI alone, mammography alone, and the combined MRI and mammography. While quite a few studies have reported the radiomics models developed using MRI (23,24,34) or mammography (19)(20)(21)(22), the combined analysis was rarely reported (35). We further investigated the complementary role of MRI and mammography features in diagnostic sensitivity and specificity. In the training set, the combined model (89.6%) had a higher accuracy than individual ones (83.2% for mammography, 77.2% for mammography). When mammography features were added to MRI features, it could significantly improve specificity from 69.6% (39/56) to 82.1% (46/56); and thus, have the potential to decrease unnecessary biopsy. Interestingly, the sensitivity was also improved, so the higher specificity was not at the expense of compromised sensitivity. Similar findings were seen in the testing set, with slightly lower overall accuracy from 89.6% to 83.3%. For mammography, we further separated the analysis using features extracted from the lesion-ROI alone, and from the margin-ROI alone by using a bandshell. The results showed that the accuracy was much better for the lesion model than the margin model, but the margin information could help to improve the accuracy. The results were consistent with the knowledge that margin plays an important role in characterization of a lesion for diagnosis.
Since MRI is more expensive than mammography, the most established clinical indication is for pre-operative staging and high-risk screening. It is not always included in the standard diagnostic workup. It has been shown that in the mammography 4 category, particularly in non-palpable lesions presenting only with microcalcifications, MRI can be used to reduce false positives and avoid unnecessary biopsy (11,36,37). On the other hand, benign lesions may show enhancements on MRI, and the   information from mammography may help to rule out malignancy (38). As in the case examples shown in Figures 4  and 5, the benign lesions might be inconspicuous on mammography and had low BI-RADS score of 2, and we had to use MIP generated from MRI as a reference to locate them. Since MRI and mammography evaluate different aspects of the underlying pathology, they should be reviewed together to determine which information needs to be weighted more.
Radiomics is becoming an active research field in breast cancer diagnosis. Due to the large number of images acquired using different MR sequences, radiomics provides an efficient analysis method to extract information. Therefore, more MRI radiomics studies were reported than ultrasound, mammography, and 18F FDG PET/CT (34). MRI radiomics was shown to provide better discrimination than conventional parameters for the diagnosis of breast cancer (23,24). Mammography radiomics analysis has also been performed in several diagnostic studies (19)(20)(21)(22). However, since the patient cohort is different, the diagnostic accuracy will be highly dependent on the inclusion/exclusion criteria, and not directly comparable among studies. Mao et al. (19) used four modeling algorithms, including SVM, naive Bayes classifier, k-NN classifier, and logistic regression to differentiate between benign and malignant cases, and showed a high vibration of 0.629-0.978 in the obtained accuracy. The radiologists' reading accuracy was 0.772. Lei et al. (20) applied radiomics to diagnose patients showing BI-RADS 4 calcifications on mammography, and achieved AUC of 0.80 in the validation cohort. For characterizing microcalcifications, since the lesion area was not well-defined, the ROI drawing will affect the extracted features, and thus, the diagnostic results. Huang et al. (21) applied mammography radiomics for distinguishing male malignant and benign lesions, and reported an AUC of 0.82a very unique study in rarely reported male patients. Another study by Niu et al. (22) also analyzed patients showing abnormal lesions on mammography and MRI, close to our patient cohort, but their goal was to evaluate the combined effect of mammography and digital breast tomosynthesis (DBT), as well as the combined effect of DCE and diffusion weighted MRI. The reported accuracy based on the mammography was close to ours, around 0.70. Multi-modal radiomics combining different imaging modalities are rarely reported. In a study by Chen et al., the multimodal classifier achieved a better diagnostic performance than any single modality (35). Since each imaging modality is unique in its acquisition method and parameter setting, the extracted features from a lesion may be different and provide complementary information to improve diagnostic accuracy.
In this study, the cases were identified from the MRI database first, and then only those with mammography performed within one month were further selected for analysis. All lesions showed strong enhancements on MRI, and the information was used to determine a corresponding ROI on mammography. Coregistration of MRI and mammography to ensure that the traced ROI is indeed coming from the same suspicious tissue is not a trivial task. We used maximum intensity projection of MRI as guidance, and it could be projected from different angles to simulate CC view and MLO view to guide the tracing of the suspicious tissues on mammography. Some computer techniques have been proposed for registration between MRI and mammography, e.g., using finite element methods by Hopp et al. (39) and Mertzanidou et al. (40), and the thin-plate spline method by Yang et al. (3). These registration techniques can be considered in future multi-modality radiomics studies. However, since the mammography was acquired using heavily compressed breast tissues in a different body position, it might be difficult to find the precise correspondence. Therefore, in this study we only analyzed the CC or MLO view that had more clear presentation of the lesion.
There were several limitations in this study. First, the models were developed using a dataset from a single institution. The earlier cases were used for training, and the performance was evaluated using 10-fold cross-validation. We assembled an independent testing set using later cases according to time of enrollment, so the developed models from training can be independently tested. Another limitation is that the sample size was relatively small. In our dataset, all benign lesions had to show visible enhancements on MRI and were histologically confirmed, which were very strict criteria and limited the number of eligible cases. However, since the major goal of this study is to investigate whether and how much the addition of mammography radiomics features can complement MRI, using a strict rule to identify eligible cases with histologically confirmed lesions is needed. Third, while all lesions showed enhancements on MRI, lesions not visible with the MRI-guidance on mammography were not included in this study. Since the boundary of these lesions could not be clearly defined, the radiomics features might not be reliably extracted.
In conclusion, the radiomics models built based on combined MRI and mammography had better diagnostic accuracy than models built using single modality alone. The combined model could reach the accuracy of 89.6% in the training and 83.3% in the testing sets. The motivation of this study is to use the complementary information extracted from radiomics analysis of the lesion shown on mammogram to decrease the false positive diagnosis of contrast-enhanced benign lesions on MRI. In the western countries, breast MRI is recommended as a clinical modality for screening of women with a high risk of developing breast cancer, and the false positive diagnosis in a screening population will lead to many unnecessary procedures including biopsy, and patient anxiety. Our study may provide a helpful computer-aided diagnostic tool for such clinical indications. The multimodality radiomics analysis by combining mammography and MRI features has the potential to improve the specificity and reduce unnecessary biopsies, while maintaining a high sensitivity for diagnosis of breast cancer.

DATA AVAILABILITY STATEMENT
The datasets used and analyzed in this study will be made available by the corresponding author on a reasonable request.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of The First Affiliated Hospital of Wenzhou Medical University. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.