Using Ultrasound-Based Multilayer Perceptron to Differentiate Early Breast Mucinous Cancer and its Subtypes From Fibroadenoma

Objectives Mucinous breast cancer (MBC), particularly pure MBC (pMBC), often tend to be confused with fibroadenoma (FA) due to their similar images and firm masses, so some MBC cases are misdiagnosed to be FA, which may cause poor prognosis. We analyzed the ultrasonic features and aimed to identify the ability of multilayer perceptron (MLP) to classify early MBC and its subtypes and FA. Materials and Methods The study consisted of 193 patients diagnosed with pMBC, mMBC, or FA. The area under curve (AUC) was calculated to assess the effectiveness of age and 10 ultrasound features in differentiating MBC from FA. We used the pairwise comparison to examine the differences among MBC subtypes (pure and mixed types) and FA. We utilized the MLP to differentiate MBC and its subtypes from FA. Results The nine features with AUCs over 0.5 were as follows: age, echo pattern, shape, orientation, margin, echo rim, vascularity distribution, vascularity grade, and tumor size. In subtype analysis, the significant differences were obtained in 10 variables (p-value range, 0.000–0.037) among pMBC, mMBC, and FA, except posterior feature. Through MLP, the AUCs of predicting MBC and FA were both 0.919; the AUCs of predicting pMBC, mMBC, and FA were 0.875, 0.767, and 0.927, respectively. Conclusion Our study found that the MLP models based on ultrasonic characteristics and age can well distinguish MBC and its subtypes from FA. It may provide a critical insight into MBC preoperative clinical management.


INTRODUCTION
Mucinous breast cancer (MBC) accounts for about 2% of all invasive breast carcinomas (1), whose prevalence is reported to be 1%-6% of all breast cancers (2). According to WHO classification, MBCs are classified as pure (pMBCs) and mixed MBCs (mMBCs) based on the lesions' mucin production. The pMBC consists exclusively of tumor tissue with a mucinous component above 90%, while mMBC with mucinous areas covers more than 50% but <90% of the total area and admixes usually with an infiltrating ductal epithelial component (2,3). For MBC, metastatic disease rate ranges were reported from 12% to 14% in the case series (4). pMBC has a better overall survival than mMBC (3). Clinically, MBCs are palpable and firm masses and often tend to be confused with fibroadenomas (FAs). Some of them were misdiagnosed as FAs, delaying treatment, resulting in axillary node metastasis, chemotherapy, and shortened disease-free survival. Thus, it is essential to precisely differentiate early MBCs and their subtypes from FAs through radiological methods.
Mammography, magnetic resonance imaging (MRI), and ultrasound (US) are the main imaging techniques for discovering breast masses and preliminarily judging their histological properties. The efficiency of mammographic mass detection is low in dense breast tissues and in MBCs (5,6). MRI is very expensive and has been associated with high false-positive rate for breast cancers (7). In contrast, US is inexpensive, non-radioactive, and widely available, and is therefore the preferred radiological means for diagnosing breast masses, especially in dense breast tissues (8).
Currently, the American College of Radiology Breast Imaging Reporting and Data System's (ACR BI-RADS) lexicon is the most commonly implemented evaluating system for breast lesions. In practice, some MBCs and FAs have the similar images. Based on the lexicon, some MRI studies focused on differentiating MBCs and FAs (9,10). Despite the fact that one of such studies has selected optimal characteristics related with MBCs, it has not analyzed the association with the subtypes (10). Regrettably, previous US studies have just presented the features of each MBC subtype (11-13). They failed to predict MBCs, subtypes, and FAs based on a single clinical or ultrasonic feature. Therefore, we should conduct the integrated approach, such as machine learnings.
As one of machine learnings, multilayer perceptron (MLP) performs very well on nonlinear data (14), has high fault tolerance, and can solve complex problems (15,16). Previous ultrasonic studies have performed the classification well for malignant tumors using MLP (17,18). To our best knowledge, there is no ultrasonic study that analyzes the ultrasonic characteristics to distinguish MBC and its subtypes from FA using MLP. In this study, we analyzed the ultrasonic features of MBC subtypes and FA using MLP and identified whether MLP can perform the classification well to improve the diagnostic performance for early MBC subtypes and FA.

Participants and Study Design
Ethical approval was approved by Research Ethics Committee of Guangdong Provincial People's Hospital for this retrospective study, and the informed consent requirement was waived due to the retrospective study. The histological characteristics of the included breast masses were gathered from pathology reports. From January 1, 2013 to December 30, 2019, 61 pMBCs and 31 mMBCs patients were enrolled in this retrospective study. Then, from January 1, 2019 to May 31, 2019, 101 consecutive FAs were enrolled in this retrospective study because FAs were the most common. All patients' age range was 15-82 years old, and mean age was 43.64 ± 14.40 years old.
The inclusion criteria were the following: (1) breast masses identified as pMBCs, mMBCs, or FAs through histological examination; (2) patients with single mass; and (3) patients of MBC without axillary node and distant metastasis.
The exclusion criteria were the following: (1) lesions that were metastatic tumors; (2) patients exposed to systemic hormone therapy or adjuvant chemotherapy; (3) lesions larger than 6 cm.

Ultrasonic Image Acquisition and Interpretation
Ultrasonic image acquisition was captured using a 14-MHz linear transducer (Toshiba Aplio 500, Canon Medical Systems Corp., Tokyo, Japan). Images of the masses were collected in a standard manner, containing at least two orthogonal planes (the radial and antiradial planes or transverse and longitudinal planes), by two breast radiologists (reader 1 with 10 and reader 2 with 5 years' experience, respectively) following the ACR BI-RADS fifth edition classification scheme. As directed by the guide and previous article (19), the two radiologists kept a strict record of US features. Both were blind to the histological outcome but not to ages. The ultrasonic characteristics comprised of 10 items: nodulous echo pattern, shape, orientation, margin, posterior features, tumor size, calcifications, echogenic rim, vascularity distribution, and vascularity grade. Detailed feature descriptions are presented in the data supplement (Appendix 1).
For the records of each ultrasonic feature, any disagreements between the two readers were resolved by final consensus following discussion.

Statistical Analysis
Statistical analysis was conducted with the SPSS software (Version 22.0, IBM Corp., Armonk, NY, USA). The statistical significance levels were two-sided, and p < 0.05 was deemed to be statistically significant.

Comparison of the MBC and FA Groups and Multiple Comparisons of pMBC, mMBC, and FA
Depending on ultrasonic features and age, the differences between MBC and FA were evaluated. Continuous variables were compared using the Mann-Whitney U test or t-test. Categorical variables were compared using the chi-square test or Fisher's exact test.
With respect to ultrasonic features and age, the multiple comparisons among pMBC, mMBC, and FA were assessed. Hereby, continuous variables were compared using the least significance difference (LSD), whereas categorical variables were compared using the Kruskal-Wallis test.

Predicting MBC and FA
For all ultrasonic features and age, the receiver operating characteristic curves (ROCs) were plotted using ROC in SPSS Statistics. According to the curves, the respective area under curves (AUCs), sensitivity, and specificity were calculated and given automatic in SPSS Statistics. Youden index is equal to sensitivity plus specificity minus one. The sensitivity, specificity, and Youden index of those features, whose AUCs were over 0.5, were presented.
In addition, for distinguishing MBC from FA, the Multilayer Perceptron in SPSS Statistics was used to complete MLP analysis. After completing the process, the AUC of MLP and the importance of features were given automatic in SPSS Statistics.

Predicting MBC Subtypes and FA
MLP was used to distinguish MBC subtypes from FA, and the corresponding methods are shown in the previous paragraph. The AUC of MLP and the importance of features were provided.

Clinical Use
The two models of MLP can be saved in the XML file. When there are new data, you can directly call this file in the SPSS software to calculate the probability of the type of MBC or FA in the data supplement (Appendix 2).

Comparison of MBC and FA and Multiple
Comparisons of pMBC, mMBC, and FA Patients' ages and 10 detailed ultrasonic characteristics are revealed in Table   There were significant differences in 10 variables (p-value range, 0.000-0.004) between MBC and FA, except posterior feature (Table 1).
In subtype analysis, one-way ANOVA analysis found that there were statistically significant differences in 10 variables (pvalue range, 0.000-0.037) between pMBC, mMBC, and FA groups as a whole, except posterior feature (p-value, 0.630). Furthermore, the multiple comparisons of the 10 variables with statistically significant differences are outlined in Table 1.

Predicting MBC and FA
The AUCs of all the 11 variables for MBC and FA were calculated. The nine AUCs over 0.5 were as follows: age, echo pattern, shape, orientation, margin, echo rim, vascularity distribution, vascularity grade, and size. Their corresponding AUCs and sensitivity, specificity, and Youden index of the above predictors for differentiating MBC from FA are displayed in Table 2. The AUCs of posterior feature and calcification were below 0.5, indicating that these two variables could not distinguish between MBC and FA.
For predicting MBC and FA, the AUCs of MLP were calculated, and the ROCs of MLP are plotted in Figure 1. According to ROCs, AUCs were both 0.919. The importance of the features is depicted in Figure 2.

Predicting MBC Subtypes and FA
The AUCs of MLP for predicting pMBC, mMBC, and FA were calculated (AUCs, 0.875, 0.767, and 0.927), and the ROCs of MLP are plotted in Figure 3. The importance of the features is plotted in Figure 4.

Clinical Use
The two models of MLP can be saved in the XML format for analysis of new data (data supplement), and the illustrations of their application are shown in data supplement (Appendix 2).

DISCUSSION
In our study, we analyzed the differences between MBC and FA, and the pairwise comparison of MBC subtypes and FA. For differentiating MBC and FA, our study observed that the sensitivity, specificity, and Youden index of age were highest, and the other eight variables exhibited modest values. Subsequently, we used the MLP to predict MBC and its subtypes and FA. Our study showed that the MLP models based on ultrasonic characteristics and age can well predict MBC and its subtypes and FA.
Our study is distinct from previous studies. Previous studies focused on reporting the correlation between ultrasonic imaging features and histological signs (12,13). Additionally, one study proposed automated breast volume scanning and ultrasound elastography as means of predicting breast cancer, but MBC was just one of the several subtypes of breast cancer that had to be studied (20). Obviously, these studies did not investigate the differences between MBC subtypes and FA in sufficient depth.
Our study found that age and ultrasonic features, except for posterior feature and calcification, could differentiate MBC and FA based on AUCs, but the effectiveness of the ultrasonic features was moderate or poor. Obviously, the above AUCs for predicting MBC were not applicable to predict each subtype and FA. The multiple comparisons among pMBC, mMBC, and FA pointed out that there were differences in 10 variables ( Table 1), but there was no feature that can predict MBC subtypes and FA. Therefore, single feature could not predict MBC and its subtypes and FA well. We need a more efficient tool to accomplish this task.
Before using MLP, we tried to use multinomial regression analysis, a traditional statistical method used in a similar study (21). However, the results were not satisfactory. The pseudo R 2 of Cox and Snell was 0.495, and the p-value of Pearson test for goodness-of-fit was 0.000. The closer the R 2 and p-value to 1, the better the fit of the model, which indicated that the fit of our model was poor and the model was meaningless.
Our study showed that the combination of ultrasonic characteristics with age by MLP can predict MBC and its subtype and FA well using MLP. Then, the two MLP maps of importance demonstrated that the importance of features was different. The top 5 features were age, size, margin, posterior features, and echo rim (Figures 2 and 4). As far as we know, there is no study assessing the importance of ultrasonic features for MBC and its subtypes.
Age and tumor size were the strongest predictor of MBC and its subtype and FA. The older the patients are, the more likely the patients are to develop breast cancer (22,23). Tumor size remains the important risk factor for predicting MBC, especially for pMBC. According to the biological behavior of the tumor, the more rapidly that tumor size increases, the greater the likelihood of malignancy. The size of benign tumor can remain stable for many years or increase slowly. Not circumstanced margin and calcification within masses were more positively correlated with mMBC, which is mixed with less mucin content and more no-special-type content. According to Table 2, the AUC of posterior features was lower than 0.5, and it cannot differentiate MBC from FA alone. However, posterior feature was one of the top 5 features in MLP. Enhanced posterior feature was the most common in pMBC because pMBC contains more extracellular mucin and has a better sound transmission ability than mMBC and FA. The presence of enhanced echogenic rim is more common in pMBC and less common in FA. In previous studies, the perifocal hyperechoic zone was associated with malignancy due to histological lymphatic invasion of the surrounding breast tissue (24,25). In our study, although a single feature could not predict MBC well, a strong predictive ability can be obtained by combining all features through MLP, especially in predicting FA and MBC (AUC, 0.919). Therefore, MLP was identified to be a fine classifier for the complex issue, like the previous study (15).
Our study has several limitations. First, our study's sample size was relatively small; prospective studies with large datasets are indispensable to validate our study's result. Second, the features did not contain clinical risk factors due to the incomplete nature of retrospective study data. Prospective   studies necessitating complete datasets (BMI, serological examination) should be conducted. Third, our feature estimation was highly dependent on a subjective analysis with inevitable bias. Objective parameters' studies need to be conducted (ultrasonic radiomics, contrast enhancement). Finally, the MLP can solve the complex classification and has the strong practicality, but the interpretability of each feature is poor. We can try other machine learnings to deal with this classification in future.
In summary, ultrasound characteristics of MBC, particularly pMBC, tend to be similar with FA. Our study found that combination of ultrasound characteristics and age by MLP can predict MBC and its subtypes and FA well. It may provide a critical insight into MBC preoperative clinical management.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Guangdong Provincial People's Hospital. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
TL, JS, and CH conceived and designed the study. TL, SZ, and SC collected the clinical and image data. SZ, SC, and JL read and kept the record of all images. TL and JS wrote the manuscript. SP and SS reviewed and re-edited the manuscript. All authors contributed to the article and approved the submitted version.