The Utility of Texture Analysis Based on Breast Magnetic Resonance Imaging in Differentiating Phyllodes Tumors From Fibroadenomas

Background: The preoperative diagnosis of phyllodes tumors (PTs) of the breast is critical to appropriate surgical treatment. However, reliable differentiation between PT and fibroadenoma (FA) remains difficult in daily clinical practice. The purpose of this study was to investigate the utility of breast MRI texture analysis for differentiating PTs from FAs. Materials and Methods: Forty-two PTs and 42 FAs were enrolled in this retrospective study. Clinical and conventional MRI features (CCMF) and MRI texture analysis were used to distinguish between PT and FA. Texture features were extracted from the axial short TI inversion recovery T2-weighted (T2W-STIR), T1-weighted pre-contrast, and two contrast-enhanced series (first contrast and third contrast). The Mann–Whitney U test was used to select statistically significant features of texture analysis and CCMF. Using a linear discriminant analysis, the most discriminative features were determined from statistically significant features. The K-nearest neighbor classifier and ROC curve were applied to evaluate the diagnostic performance. Results: With a higher classification accuracy (89.3%) and an AUC of 0.89, the texture features on T2W-STIR outperformed the texture features on other MRI sequences and CCMF. The AUC of the combination of CCMF with texture features on T2W-STIR was significantly higher than that of CCMF or texture features on T2W-STIR alone (p < 0.05). Based on the result of the classification accuracy (95.2%) and AUC (0.95), the diagnostic performance of the combination strategy performed better than texture features on T2W-STIR or CCMF separately. Conclusions: Texture features on T2W-STIR showed better diagnostic performance compared to CCMF for the distinction between PTs and FAs. After further validation of multi-institutional large datasets, MRI-based texture features may become a potential biomarker and be a useful medical decision tool in clinical trials having patients with breast fibroepithelial neoplasms.


INTRODUCTION
Phyllodes tumor (PT) is a rare tumor accounting for 0.3-1.0% of all mammary tumors and comprises 2-3% of all fibroepithelial mammary neoplasms (1,2). The histological classification is subdivided into benign, borderline, or malignant (3); however, histological type is found to poorly correlate with clinical behavior (4,5). Incidence of local relapse is high regardless of the histological grading, and distant metastasis may occur in approximately 25% of malignant PTs (6,7). With similar clinical features and histopathological appearance, PT may mimic a fibroadenoma (FA), which is the most common benign tumor of the breast. Sometimes, even preoperative invasive procedures such as fine-needle aspiration cytology and core needle biopsy may fail to correctly differentiate these two entities, primarily owing to lack of adequate and representative samples (8,9). Given the different prognosis, a surgical excision is essential with a wide margin of at least 1 cm for all grades of PT to avoid local relapse and subsequent surgery (10,11); on the other hand, a FA can usually be safely followed-up or managed by a simple enucleation (12). Therefore, accurate preoperative diagnosis is crucial to offer an appropriate clinical strategy, thus avoiding operative complications resulting from inadequate excision or surgical overtreatment.
Clinically, in contrast to FA, PT can grow rapidly to huge bulky ones with a high reported incidence of local relapse (13). In addition, PT was generally thought to develop later in life than FA (6,14).
According to previous reports, MRI features have been valuable in the differentiation between PTs and FAs. Kamitani et al. (15) described the MRI features of PTs and noticed a pattern of heterogeneous enhancement, internal cystic components, and increased lobulations in PTs. Although certain clinical and MRI features may raise the index of suspicion, it is challenging to make a reliable differentiation between PT and FA. In daily clinical practice, a benign, small-sized borderline or malignant PT can be easily mistaken for a FA, whereas giant FAs may show overlapping MRI features of PTs.
Radiomics has drawn increasing attention in recent years. It is based on a hypothesis that medical imaging information can be converted into quantitative and mineable features via automatically high-throughput extraction of data characterization algorithms that in turn provide valuable diagnostic, prognostic, or predictive assessment (16)(17)(18). Several radiomics studies have shown that some quantitative imaging signatures, such as texture features derived from MRI, can provide an opportunity to facilitate better clinical decision-making in oncology at low cost and non-invasively. For example, texture analysis has been used to predict sentinel lymph node metastasis in breast cancer (19), differentiate estrogen receptor-positive breast cancer molecular subtypes (20), and identify healthy breast tissue and breast cancer lesions (21). Thus, in the present study, we hypothesized that texture features on routine, enhancement, and non-enhanced T1and T2-weighted MR images, could help to improve the differentiation between PTs and FAs.

Patients
The retrospective study protocol was approved by our institutional review board. In this study, 53 female patients with histologically confirmed PT between June 2012 and June 2018 were enrolled and 78 female patients with histologically confirmed FA were randomly selected. The inclusion criteria were as follows: (1) female patients were histologically diagnosed with PT or FA by two experienced pathologists based on findings in the specimens obtained at surgical resection, (2) those who underwent breast MRI prior to surgical resection, and (3) those with lesions measuring >1 cm in diameter avoiding the possible unfavorable effects on textural features extracted from image data. The exclusion criteria were as follows: (1) a previous history of breast cancer and radiotherapy, and (2) poor image quality. Finally, 41 female patients with 42 PTs and 37 female patients with 42 FAs were eligible in this study.

MRI Acquisition
All patients were scanned using a 1.5T dedicated breast MRI system (Aurora Dedicated Breast MRI Systems) with a single channel breast coil. For dynamic imaging, gadoliniumdiethylenetriamine pentaacetic acid (Gd-DTPA, Magnevist) was intravenously injected as a bolus of 0.2 ml per kg of body weight at a rate of 2 mL/s followed by a 20-mL normal saline flush. A dynamic series of transverse T1-weighted fat-suppression images were acquired at pre-contrast and post-contrast at 90, 270, 450, and 630 s by using the following imaging parameters: TR = 29 ms, TE = 4.8 ms, slice thickness = 1.1 mm, matrix = 360 × 360 × 128, and FOV = 36 cm. In addition, axial short TI inversion recovery T2-weighted (T2W-STIR) images were performed under the following conditions: TR = 6,680 ms, TE = 68 ms, slice thickness = 3.0 mm, matrix = 320 × 192, FOV = 36 cm. Fat suppression was applied using a short TI-inversion recovery technique.

Clinical and Conventional MRI Features Assessment
Clinical and conventional MRI features (CCMF) was used to differentiate PTs from FAs. The clinical variables assessed included age, whether the lesions showed rapid enlargement, and whether the lesions were primary or recurrent. The conventional MRI features for each patient were independently reviewed by two radiologists with 12 and 5 years of experience, respectively, blinded to the histopathological diagnoses. For the cases with discrepancies in the CCMF assessment between the two radiologists, these were jointly reviewed by the two radiologists to reach a consensus for further analysis. Interpretation of some conventional MRI features was based on three following characteristics as per the American College of Radiology Breast Imaging Reporting and Data System MR imaging criteria (version 5) (22,23): the margin of masses (circumscribed vs. non-circumscribed); initial signal intensity enhancement (slow, medium, or fast); and time-intensity curve (TIC) pattern on dynamic contrast-enhanced images (the persistent, plateau, or washout pattern). The presence or absence of a cystic component and internal septation were determined, and the extent of lobulation was divided into strong (with an acute angle) or weak (obtuse angle). In addition, we analyzed the tumor size (the greatest lesion diameter); tumor signal intensity on T2W-STIR (homogeneous vs. heterogeneous); and signal intensity enhancement of third sequence of post-contrast (homogeneous vs. heterogeneous). For the measurements of enhancement features including initial signal intensity enhancement and TIC, the region of interest (ROI) was placed onto the area of the lesion where the enhancement was strongest in the first sequence of the post-contrast image. Examples of these MRI features were shown in Figure 1. For recurrence patients, only clinical and MRI data at the time of recurrence was included and evaluated in this study.

Texture Analysis
The T2W-STIR, T1-weighted pre-contrast, and two contrastenhanced series were chosen for texture analysis. Image slices were selected on the basis of presentation of the largest lesion diameter. The ROI, containing the entire visible tumor and excluding equivocal normal breast tissue, was manually drawn for each image. Texture analysis was performed by software  (24,25). MaZda allows the quantitative analysis of approximately 300 texture features based on the following algorithms: histogram, absolute gradient, run length matrix, co-occurrence matrix, autoregressive model, and wavelet transform (24,25), as shown in Table 1. All these texture features were calculated for each ROI. The co-occurrence matrix parameters were calculated in four directions (θ = 0, 45, 90, and 135 • ) with interpixel distances of n = 1, 2, 3, 4, and 5. The gray-level normalization, which is known to minimize the effect of contrast variation and brightness, was carried out using a method that normalizes image intensities within µ ± 3σ (µ, gray-level mean; and σ, gray-level standard deviation).
In MaZda, a combination of feature selection algorithms including mutual information, classification error probability combined with average correlation coefficients, and Fisher coefficient were applied to determine 30 texture parameters with the highest discriminative power for classification on each MRI pulse sequence. These features were then exported for further processing and classification to a statistical program B11 (24).

Feature Selection and Classification
Statistically significant features were selected among the raw texture features on each MRI sequence and CCMF. A linear discriminant analysis was performed for statistically significant features using MaZda to obtain the most discriminative features (26). Then, the K-nearest neighbor classifier (K = 3) was employed to distinguish between PT and FA based on the most discriminative features using software routines written in MATLAB 7 (Mathworks). For training the classifier, 28 PTs and 28 FAs were used, whereas for testing the classifier, the remaining 14 PTs and 14 FAs were used.
A workflow chart of the distinction between PT and FA based on texture features and CCMF are illustrated in Figure 2.

Statistical Analysis
We compared the raw texture parameters on each sequence and CCMF between PTs and FAs using the Mann-Whitney U test. For evaluating the diagnostic efficiency of each approach, we employed receiver operating characteristic (ROC) analysis. These analyses were performed using package SPSS 22.0 for Windows. For each ROC curve, comparisons of the area under the curve (AUC) were performed with methods described by Hanley (27). P < 0.05 was considered to indicate statistical significance.

Clinical and Conventional MRI Features
The clinical characteristics and conventional MRI findings of 42 PTs and 42 FAs are summarized in Table 2. There were 25 benign, 14 borderline, and 3 malignant PTs based on the histological findings. Patients with PTs were significantly older than those with FAs (p < 0.001). The mean maximal diameter (standard deviation) was 4.70 ± 3.45 cm for the PT group and 3.48 ± 2.36 cm for the FA group. The PTs tended to be larger than the FAs, although this difference was not statistically significant (p > 0.05). The local recurrence rates of PTs and FAs were 26.2 and 0%, respectively. Overall, 61.9% PTs (26/42) showed strong lobulation, whereas only 13 (31.0%) FAs among 42 expressed strong lobulation. The PTs showed strong lobulation pattern more frequently than FAs (p = 0.004). Cystic components were seen in 20 (47.6%) PTs but only in 6 (14.3%) FAs (p = 0.001).
The PTs had a significantly higher frequency of internal septum than the FAs (p = 0.009). The FAs tended to be homogeneous more frequently seen on T2W-STIR than the PTs (p = 0.001). There were no significant differences between the PTs and FAs in rapid growth, margin, septation enhancement, tumor signal intensity on the third post-contrast images, initial signal intensity enhancement, and TIC curve pattern.
For clinical and conventional MRI features (CCMF), the classification accuracy of K-nearest neighbor classifier was 76.2%. For ROC analysis, the AUC was 0.76 (95% CI: 0.66, 0.87), and the sensitivity and specificity were both 76.2%.

Texture Features
PTs and FAs presented a differential textural pattern. Certain texture features extracted using MaZda were significantly different, as shown in Table 3 and Supplementary Information. The number of statistically significant texture features on T2W-STIR was greater than other MRI sequences. For texture features on MRI, the classification accuracies were 89.3, 69.1, 71.4, and 67.9%, for T2W-STIR, T1-weighted pre-contrast, and two contrast-enhanced series (first and third post-contrast sequence), respectively. For ROC analysis, the AUCs were 0.89 (95% CI: 0.82, 0.97); 0.69 (95% CI: 0.58, 0.81); 0.71 (95% CI: 0.60, 0.83); and 0.68 (95% CI: 0.56, 0.80) for T2W-STIR, T1-weighted pre-contrast, and the first and third post-contrast sequences, respectively. The most discriminative features on T2W-STIR had higher classification accuracy (89.3%); AUC (0.89, 95% CI: 0.82, 0.97);  Figure 3 shows the ROC curves for the K-nearest neighbor classifier when the classifier was trained with most discriminative features of CCMF, texture features on each MRI sequence, and the combination strategy. The texture features on T2W-STIR, with higher classification accuracy (89.3 vs. 76.2%) and  Figure 4), and exhibited lower specificity (76.2 vs. 90.5%) resulting in a few false positive results (example shown in Figure 5). The AUC of the combination was significantly higher than that of CCMF or texture features on T2W-STIR alone (p < 0.05). According to the result of K-nearest neighbor classification and AUC, the diagnostic performance of the combination performed better than texture features on T2W-STIR or CCMF alone.

DISCUSSION
In the current study, texture analysis based on MRI was applied to evaluate the differential diagnosis between breast PTs and FAs. Texture features on T2W-STIR with higher classification accuracy and AUC performed better than clinical and conventional MRI features (CCMF). Texture features on T2W-STIR were more sensitive than CCMF which exhibited higher specificity. In our study, PT could be mistaken for FA using CCMF but was correctly identified using texture features on T2W-STIR, regardless of whether the lesion was benign or malignant. In addition, we found that the diagnostic performance using the combination of CCMF with texture features based on  T2W-STIR was better than CCMF or texture features on T2W-STIR alone. The classification accuracy reached 95.2%, when the most discriminative features of combination strategy were used to train the classifier. By using a combination strategy, the AUC, specificity, and sensitivity were 0.95, 95.2%, and 95.2%, respectively.
Prior studies (15,(28)(29)(30) have indicated differences in the clinical and conventional MRI characteristics for differentiating between PTs and FAs, which was also validated in this study. Our study showed that higher age, recurrence, strong lobulation, and internal cystic components were detected significantly more frequently in PTs than in FAs, which were in line with prior studies (15,(28)(29)(30). Some groups report that hypointense internal septation was more likely to be presented in PTs than in FAs, but this difference was not statistically significant (15). Even though PTs showed significantly more frequent hypointense internal septations than FAs on MRI in this study, septation enhancement was not statistically significant between PTs and FAs. As reported in some articles (8,15), PTs were frequently larger than FAs, but there was no significant difference in size between them in our study, likely because the selected tumors were of a relatively large size (>1 cm in diameter). Kamitani et al. (15) found that FAs tended to be homogeneous more frequently than PTs on T1-weighted post-contrast images, but this difference was not significant; there was no significant difference in the TIC curve pattern between the two groups; both of which were consistent with our results. In addition, we found a significantly higher frequency of heterogeneous signals in PTs than in FAs on T2W-STIR, which had been rarely mentioned in previous literatures (15).
Both PTs and FAs are breast fibroepithelial neoplasms. Histologically, they share a dimorphic pattern with both epithelial and stromal components. However, PT can usually be differentiated from FA by its exaggerated intracanalicular growth pattern with increased and heterogeneous stromal cellularity (9). Internal cystic components, septation, and heterogeneous signal on T2W-STIR may be caused by the histopathologically heterogeneous nature of PT, and the stronger lobulation might be related to the rapid growth.
Texture analysis was utilized to evaluate the ability to differentiate PTs from FAs. The number of statistically significant texture features on T2W-STIR was larger than those on T1-weighted pre-contrast and two contrast-enhanced series. Furthermore, the diagnostic performance of these statistically significant texture features on T2W-STIR outperformed that on other MRI sequences, with an AUC of 0.89 and a classification accuracy of 89.3%. The result of texture analysis was in line with that of conventional MRI characteristics that the signal intensity on T2W-STIR was significantly different, but there were no significant differences in features after enhancement between PTs and FAs, such as tumor signal intensity on the third sequence of post-contrast images, initial signal intensity enhancement, and the TIC curve pattern. Previous breast MRI studies mainly focused on dynamic enhancement sequence probably because of its detailed morphological and hemodynamic information; however, dynamic contrast-enhanced MRI was less significant than T2W-STIR to distinguish between PTs and FAs in our study. The echo time of T2W-STIR is relatively long, which offers a higher signal-to-noise ratio, spatial resolution, and soft tissue contrast of breast PTs and FAs. Hence, we hypothesized that texture analysis based on these T2W-STIR images might reveal more subtle alterations in the tumor microenvironment. Textural features extracted from T2W-STIR reflect more differences between PTs and FAs, by capturing the intra-tumoral heterogeneity.
In our study, mean lesion sizes were >3 cm for both PT and FA. With lesion sizes >3 cm, it would be advisable that all these lesions should be resected anyway (8), but they require different surgical procedures. FAs need only enucleation, whereas both benign and malignant PTs require wide local excision with a margin of at least 1 cm (10, 11) because the high recurrence rate in patients with resection margins of <1 cm around the primary tumor (10,13). A combination of CCMF with texture features on T2W-STIR can provide accurate preoperative diagnosis for these cases with mean sizes >3 cm, which allows appropriate clinical strategy and avoidance of operative complications resulting from inadequate excision or surgical overtreatment.
There were several limitations in our study. First, some of the MRI images were collected after fine needle aspiration of the primary tumor, and thus the hemorrhage or edema caused by the biopsy could have potentially affected feature calculation. Second, we did not explore differences among PTs of all the histologic grading due to the lack of a sufficient number of borderline and malignant PTs. Third, little pathophysiological semantics of the textural features are currently known. Additional work is necessary to understand the underlying biology of these tumors evaluated by texture analysis. Last, as a retrospective study with a small sample size of 84 cases, inherent variations and biases may have influenced the results. Further validation with a larger dataset from different centers and scanners should be strongly considered.
In conclusion, textural features extracted from T2W-STIR showed better diagnostic performance than CCMF. In addition, a combination of CCMF with texture features on T2W-STIR can reflect better diagnostic performance than CCMF or texture features on T2W-STIR alone. Texture analysis provided a novel approach to non-invasively and accurately distinguish PTs from FAs. With ongoing validation, MRI-based texture features may become a potential biomarker and provide a useful medical decision tool in clinical trials in patients with breast fibroepithelial neoplasms.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the manuscript/Supplementary Files.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Review Board Guangdong Women and Children Hospital. The patients/participants provided their written informed consent to participate in this study.