Radiomic Evaluations of the Diagnostic Performance of DM, DBT, DCE MRI, DWI, and Their Combination for the Diagnosisof Breast Cancer

Objectives This study aims to evaluate digital mammography (DM), digital breast tomosynthesis (DBT), dynamic contrast-enhanced (DCE), and diffusion-weighted (DW) MRI, individually and combined, for the values in the diagnosis of breast cancer, and propose a visualized clinical-radiomics nomogram for potential clinical uses. Methods A total of 120 patients were enrolled between September 2017 and July 2018, all underwent preoperative DM, DBT, DCE, and DWI scans. Radiomics features were extracted and selected using the least absolute shrinkage and selection operator (LASSO) regression. A radiomics nomogram was constructed integrating the radiomics signature and important clinical predictors, and assessed with the receiver operating characteristic (ROC) curve, calibration curve, and decision curve analysis (DCA). Results The radiomics signature derived from DBT plus DM generated a lower area under the ROC curve (AUC) and sensitivity, but a higher specificity compared with that from DCE plus DWI. The nomogram integrating the combined radiomics signature, age, and menstruation status achieved the best diagnostic performance in the training (AUCs, nomogram vs. combined radiomics signature vs. clinical model, 0.975 vs. 0.964 vs. 0.782) and validation (AUCs, nomogram vs. combined radiomics signature vs. clinical model, 0.983 vs. 0.978 vs. 0.680) cohorts. DCA confirmed the potential clinical usefulness of the nomogram. Conclusions The DBT plus DM provided a lower AUC and sensitivity, but a higher specificity than DCE plus DWI for detecting breast cancer. The proposed clinical-radiomics nomogram has diagnostic advantages over each modality, and can be considered as an efficient tool for breast cancer screening.


INTRODUCTION
Breast cancer has been a major concern and the second leading cause of cancer death among women (1). The prevalence of breast cancer has increased in the recent years, mainly due to the implementation of an early screening mammography (2). Although there is still no effective way to prevent breast cancer, studies have shown that early detection and treatment can increase the chance of full recovery for the patients (3).
Digital mammography (DM) using 2D technique, as a widely used tool for detecting breast cancer, has a serious limitation that the visibility of lesions may be decreased since they are frequently obscured by dense fibroglandular and other normal tissues within the breast (4), which often leads to a missed diagnosis or misdiagnosis (5). To address this issue, digital breast tomosynthesis (DBT) rotates the X-ray tubes in a limited angle, thus allowing an improved identification of anomalies obscured by normal tissues (6,7). Therefore, the DBT is commonly considered to be capable of decreasing the recall rates and increasing the detection rates for breast cancer compared with DM (8). Magnetic resonance imaging (MRI), as another popular tool for breast screening, has been demonstrated to be very sensitive in detecting breast cancer (9). While, the relative low specificity of MRI screening may lead to a high rate of overtreatment (10). Besides, the high examination fees of MRI also hinder the clinical application in early breast screening.
In the clinical practice, the diagnosis of breast cancer based on DM, DBT, or MRI mainly relies on visual inspections of the morphological changes of breast lesions, including size, shape, and gray level changes, and, thus, require experienced clinicians to make decisions. Previous reports have compared the diagnostic capabilities of DM with DBT (11,12) and mammography with MRI (13,14), all based on subjective visual examinations and the lack of quantified assessments. Recently, the radiomics-based computer aided diagnosis (CAD) has received increasing attention due to its quantitative advantages (15,16). By using automated data characterization algorithms, the radiomics can extract and select discriminative and quantified features from a region of interest, which were shown to reflect biological information regarding the tumor and were highly correlated with disease status (17). Subsequent analysis, including statistics, machine learning classifiers, and nomogram can give associations between imaging features and the underlying pathophysiology (18). Radiomics-based studies on breast cancer have been proposed for predicting the axillary lymph node metastasis (19)(20)(21)(22)(23), molecular subtypes (24)(25)(26)(27)(28), tumor grades (29)(30)(31), and treatment responses (32)(33)(34)(35)(36)(37). Some recent studies also conducted a radiomics-based quantified analysis for the diagnosis of breast cancer based on DM (38,39), DBT (40,41), and MRI (42,43) separately, and demonstrated improvements of the diagnostic performance using radiomics compared with visual examinations by radiologists. A recent effort evaluated T2W, DCE, and DWI separately and in combination, but ignored the clinical values of mammography screening, and lack of correlating their findings with clinical evaluation, which may limit the clinical applicability (44).
To our knowledge, direct and quantified comparisons among MD, DBT, and MRI have not been reported. Therefore, the present study aims to widen the understanding of mammography and MRI in breast cancer screening by directly and quantitatively comparing the diagnostic efficiency of each modality individually and in combination. Besides, this study aims to propose a visualized clinical-radiomics nomogram based on the optimal imaging combination and important clinical factors for early assessment of suspected breast lesions.

Patients
This retrospective analysis of breast DM, DBT, and MRI data was approved by the Institutional Research Ethics Board of our institute (Approval No. 2013010). The informed consent requirement was waived. A total of 120 patients [mean age ± standard deviation (SD), 48.81 ± 10.83] were enrolled between September 2017 and July 2018 in our hospital. The number of the patients harboring pathologically confirmed benign or malignant lesions were 50 and 70, respectively. Inclusion criteria were as follows: (i) older than 18 years; (ii) underwent DM, DBT, and MRI screening before surgery; and (iii) underwent surgical resection with pathological confirmation. Exclusion criteria were: (i) combined with other tumor diseases; (ii) during menstruation, pregnancy, or lactation periods; (iii) history of breast surgery, radiotherapy, or chemotherapy, as well as breast implants; and (iv) having artifacts in the images. All patients were randomly divided into training and validation cohorts at a 2:1 ratio using stratified sampling. Clinical factors including age, family history of breast cancer, history of biopsy, and menstruation status were obtained from the electronic medical record system of our hospital.

Digital Mammography, Digital Breast Tomosynthesis, and Magnetic Resonance Imaging Acquisitions
Preoperative DM and DBT examinations were performed by a radiographer with 10 years of work experience using a DBT scanner (Hologic Selenia Dimensions, Hologic, USA). The obtained images of the compressed breast were reconstructed with a 1-mm intersection spacing to give a three-dimensional view of the tissue, slice by slice, and suitably spaced. The number of the slices depends on the compressed breast thickness. The following parameters were used to perform the DBT scanning: The voltage range of the X-ray tubes: 20.0-49.0 kV (step: 1.0 kV), nominal power: 3.0 kW, current time range: 300-400 mAs, scanning time < 4.0 s, reconstruction time: 2.0-5.0 s, and pixel size: 70 mm. The obtained DBT images were interpreted on a Hologic breast computer-aided diagnosis (CAD) workstation (SecureViewDx; Hologic) equipped with two 5-megapixel monitors.
Preoperative MRI scans were performed using a 1.5-T MRI scanner (HDx, GE Healthcare). The axial diffusion-weighted imaging was used with the following parameters: the b-value: 800 s/mm 2 , repetition time (TR)/echo time (TE)/inversion time (TI): 5,000 ms/64 ms/0 ms, flip angle: 90°, slice thickness: 6 mm, slice gap: 7.5 mm, field of view: 240 mm, matrix size: 128 × 128. The axially vibrant sequence (a 3D T1-weighted imaging technique covering bilateral breasts conventional scans or dynamic enhanced scans to obtain axial or sagittal images with high signal-to-noise ratio and high resolution) with the following parameters: TR/TE/TI: 6.2 ms/3.0 ms/13 ms; flip angle: 10°; slice thickness: 3.2 mm; slice gap: 3.2 mm, 48 slices per volume; field of view: 360 mm; matrix size: 350 × 350. The contrast agent was injected intravenously (0.1 mmol/kg of Gd-DTPA-MBA, Omniscan, GE Healthcare), followed by a 20-mL saline flush, both at the rate of 3 ml/s. After the intravenous injection, continuous non-interval scans were performed in eight phases, with a scan time for each phase of 43 seconds. All scanned images were stored in the Picture Archiving and Communication System (PACS) in our hospital in a Digital Imaging and Communications in Medicine (DICOM) format. The details about their scan parameters are shown in Supplementary Tables S1, S2.

Breast Lesion Segmentation
Regions of interest (ROIs) were manually segmented slice by slice for each patient using the ITK-SNAP software (version 3.6.0) by a radiologist with 12 years of working experience according to the breast imaging reporting and data system (BI-RADS). The radiologist was blinded to the pathological results for the patients. The ROIs included the breast lesions and edges, exporting as a compressed package in an NII format for further analysis.

Radiomics Feature Extraction
Radiomics features including 18 first-order statistical, 13 shapebased, and 74 textual features were extracted based on the segmented ROIs using the Pyradiomics package in Python 3.6 (https://pyradiomics.readthedocs.io/en/). The texture feature category consists of the gray level cooccurence matrix (GLCM), gray level run length matrix (GLRLM), gray level size zone matrix (GLSZM), neighboring gray tone difference matrix (NGTDM), and gray level dependence matrix (GLDM) features. The first-order and texture features were also calculated from the original images that were filtered with eight types of filters: logarithm, square, gradient, exponential, laplacian of Gaussian, wavelet, and localbinarypattern2D (45). Detailed descriptions of the features and calculation protocols can be found in a previous report (46).

Feature Selection
To obtain reliable and discriminative features, 30 patients were randomly selected to perform the intraclass correlation coefficient (ICC) analysis (47), 15 from the training group and 15 from the validation group. The ROIs were double-blind segmented by another radiologist with 8 years of working experience. Features with ICC > 0.75 were retained, then further selected by the Mann-Whitney U test. Features with P < 0.05 were considered significant variables between the benign and malignant groups. Finally, the least absolute shrinkage and selection operator (LASSO) logistic regression was used to identify the most discriminative features with a 10-fold crossvalidation for selecting the parameter lambda using the "glmnet" package in R language v3.6 (available from URL: https://www.rproject.org) (48).

Development of the Radiomics Signature, Clinical Model, and Nomogram
The radiomics signature formula was calculated for each patient by a linear combination of the selected features weighted by the respective LASSO coefficients. The logistic regression was used to identify the discriminative clinical predictors. A clinical model was established using the multivariate logistic regression with the Akaike's Information Criterion (AIC) as the stopping rule (49). A radiomics nomogram for differentiating benign and malignant lesions was constructed incorporating the radiomics signature and the most important clinical factors using the "rms" package in R v.3.6.

Statistical Analysis
The Mann-Whitney U-test, t-test, Chi-Square test, and Shapiro-Wilk test were performed on continuous and discrete variables, respectively. All hypothesis tests were two-sided. The ROC curve analysis was performed to evaluate the diagnostic performance of each model, with the area under the ROC curve (AUC), accuracy, sensitivity, and specificity calculated as comparison metrics. The optimal cutoff value was obtained on the ROC curve with the maximum Youden index (50). ROC curves were evaluated with the DeLong test using the "pROC" package in R. Calibration curves were plotted to assess the calibration of the model-predicted results with truth values. The decision curve analysis (DCA) (51) was performed using the "rmda" package to assess the potential clinical usefulness of the models.

Patient Characteristics
The clinical characteristics of the patients were statistically analyzed and shown in Table 1. The age and menstruation status were significantly different between the benign and malignant groups (P < 0.05). No statistical difference was observed in the types of family history and history of biopsy. A clinical model was built integrating the age and menstruation status for detecting malignant lesions.

Evaluation of Diagnostic Performance of Digital Mammography, Digital Breast Tomosynthesis, and Magnetic Resonance Imaging
Diagnostic performance of the radiomics signature derived from the DM, DBT, DCE, and DWI individually and in combination were assessed ( Table 2). Figure 1 shows the ROC curves of each radiomics signature. The results indicated that the DCE generated the highest AUCs and sensitivities among the four modalities, but had relatively low specificities. The diagnostic performance of DWI plus DCE was significantly higher than DM plus DBT in terms of sensitivity. Besides, the DWI plus DCE yielded the highest positive predictive values (PPV) and the lowest misdiagnosis rates.

Development of the Combined Radiomics Signature and Nomogram
Radiomics features selected from the four modalities were combined and further selected to generate a combined feature set consisting of seven features, three from DBT, two from DCE, and two from DWI. Diagnostic performance of each feature was evaluated and is listed in A radiomics nomogram was constructed integrating the combined Rad score with the age and menstruation status (Figure 2A). The risk of being a malignant lesion can be read off the scale in the last row by vertically drawing a line from the total points. Calibration curves are shown in Figures 2B, C,  indicating acceptable agreements between the nomogramestimated probabilities and actual outcomes of the lesions. The 45-degree blue line and the red dotted line represent an ideal diagnosis and the performance of our nomogram, respectively. As the red dotted line is closer to the blue line represents a better diagnostic performance. Figures 2D, E Table 4. Figure 3 shows the results of the decision curve analysis for each model. The nomogram exhibited a greater net benefit compared with the combined Rad score or the clinical model. When the threshold probability of the patient was between 0.44 and 0.68, or over 0.78, a greater benefit can be obtained by using the nomogram, indicating a good potential in clinical applications.

DISCUSSION
Prior to this study, there have been researches evaluating the diagnostic capabilities of DM (32,38,39), DBT (40,41), MRI (42)(43)(44) separately for detecting breast cancer, all based on subjective visual examinations, and lack of direct and quantitative comparisons of different modalities. On the contrary, this study performed comprehensive radiomics analyses to quantitatively assess the diagnostic performance of different modalities separately and in combination. We found that the radiomics signature derived from DM always showed the worst diagnostic performance in terms of AUC, sensitivity, and specificity compared with the other individual modalities. This may be explainable since the DM only obtains one image, which may lead to overlapping glands, and, hence, is not sufficient to analyze the distribution of dense and adipose tissues (52). The result was in accordance with previous studies that also showed the DM-based diagnosis often leads to high false negative and false positive rates due to the fact that the lesions may be  obscured or hidden by the overlapping fibroglandular tissues (5,53). The addition of DBT to DM can significantly improve the diagnostic AUC, accuracy, specificity, PPV, and NPV, and generate a similar sensitivity compared with the DM alone. This was in line with some previous reports that also indicated that breast DBT can lead to improvements in AUC and specificity by visual assessments (54,55). This may be because the DBT can improve the lesion visibility by providing thin section tomographic images and reducing the overlap of breast tissues, and, hence, represents a clearer edge, shape, and structure of the lesion. The addition of DBT to DM did not improve the diagnostic sensitivity by visual assessments compared with DM alone as reported in an earlier study (14). The discordance may be because they performed the research with a cancer-only population. The DCE plus DWI yielded higher AUCs and sensitivities, but lower specificities than the DM plus DBT. The result was partially in line with a previous literature that also indicated that the MRI was superior to the X-ray technology in the diagnostic AUC and sensitivity, but weaker in the specificity (14,56). The DBT showed a similar diagnostic AUC, slightly increased specificity, and lower sensitivity compared with DCE or DWI, which was in line with a previous research that also demonstrated the inferiority of breast DBT in the sensitivity compared with MRI by visual examinations (14,53,57). This may be explained since the DCE can reflect the neoangiogenesis within the tumor that is associated with the growth and progression of the malignant tumor (58). While, the DWI can represent tissue microenvironments and membrane integrities through depicting the diffusivity of the tissues (59). Therefore, the MRI tends to be more sensitive than DBT or DM on tumors with higher malignant degrees. The DCE yielded higher AUC, accuracy, sensitivity, and specificity compared with DWI, which may be due to the higher resolution and the use of a contrast agent in DCE (44). We found that the addition of DBT to MRI (DBT plus DCE plus DWI) can increase the AUC and sensitivity compared with MRI alone (DCE plus DWI). This indicated that the DBT and MRI are complementary, their combination can significantly improve the predictive capabilities. While, our results were inconsistent with a previous report that showed no improvement in the diagnostic sensitivity by combing DM, DBT, DCE, and ultrasound (60). Since they involved ultrasound, direct comparisons between our study and their work was impossible.
In the clinical practice, although integrating MRI with X-rays allows the radiologists to give judgments more easily, the diagnosis still relies on subjective experiences. We selected a total of seven quantitative features as the most important predictors, three from DBT, two from DCE, and two from DWI. There were one original and six transformed features. The developed combined Rad score integrating these features significantly improved the diagnostic performance compared with any modality alone. The Original_glcm_ClusterShade feature measures the skewness and the uniformity of the gray level co-occurrence matrix within the tumor. A higher value of this feature implies a greater asymmetry about the mean and a greater heterogeneity of the lesion. We found that this feature was bigger in the malignant lesions than in the benign lesions, which suggests that a tumor with more asymmetry and complexity in the tumor texture tends to be malignant. Among the six transformed features, one belonged to the first-order and five belonged to the textural feature class. The first-order feature describes the distribution of voxel intensities in the image region. While, the textural feature quantifies the complexity of a tumor and the thickness of the texture. Our findings suggest that the tumor heterogeneity may be closely related to breast cancer, since textural features in the medical image often reflect tumor heterogeneities. The results were partially in line with previous studies that also highlighted the correlations between the textural features and breast cancer (61,62). Our findings may explain that the proposed combined Rad score can significantly improve the diagnostic performance with regard to AUC and sensitivity than  visual assessments, since most of the identified features (6 of 7) were derived from the transformed images that were generated by filtering the original images with various filters, and, thus, can hardly be understood by human.
A clinical model was built integrating age and menstruation status, and showed a lower AUC, sensitivity, and specificity than the combined Rad score. The nomogram incorporating the combined Rad score with the age and menstruation status achieved the best overall diagnostic performance compared with the combined Rad score, clinical model, and BI-RADS assessment. Decision curves demonstrated a better clinical usefulness of the nomogram with more net benefits across the majority of the range of threshold probabilities. Therefore, we suggest that our nomogram may be considered as an effective tool that can assist in decision making for the diagnosis of breast cancer. To use our nomogram, radiologists need to manually segment lesions on the DBT and MRI images for each patient, then calculate the probability of being benign or malignant. After that, clinicians can incorporate the nomogram-predicted probabilities with other clinical information to give a comprehensive decision on further examinations and treatments.
This study has limitations. First, this retrospective study had a relatively small sample size, which may cause inherent bias. Second, all data were obtained from a single hospital. Further multi-center trials are warranted to confirm the present findings. Third, our radiomic methods rely on manual segmentations of the ROIs, which were subjective and time-consuming. Future studies are needed to explore deep learning-based automatic segmentation methods on breast data.

CONCLUSIONS
Our results showed that the DBT performed similar to DCE and DWI in terms of AUC and sensitivity, but better in specificity for detecting malignant lesions. The DBT plus DM can provide a lower AUC and sensitivity, but a higher specificity compared with DCE plus DWI. The proposed nomogram achieved the best diagnostic performance, and may help clinicians make precise decisions regarding treatments.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
All analyses of human data conducted in this study were reviewed and approved by the Institutional Review Board of the Cancer Hospital of China Medical University and in accordance with the ethical standards of the institutional and/ or national research committee. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
XJ and SN contributed to the study concepts and manuscript preparation. SN and XW contributed to the study design. NZ and GL contributed to the data acquisition. SN and NZ contributed to the quality control of the data and algorithms. XJ and XW contributed to the data analysis and interpretation. YL and E-NC contributed to the statistical analysis. XJ, YD and YK contributed to the manuscript review. All authors contributed to the article and approved the submitted version.