Differentiation Between G1 and G2/G3 Phyllodes Tumors of Breast Using Mammography and Mammographic Texture Analysis

Purpose: To determine the potential of mammography (MG) and mammographic texture analysis in differentiation between Grade 1 (G1) and Grade 2/ Grade 3 (G2/G3) phyllodes tumors (PTs) of breast. Materials and methods: A total of 80 female patients with histologically proven PTs were included in this study. 45 subjects who underwent pretreatment MG from 2010 to 2017 were retrospectively analyzed, including 14 PTs G1 and 31 PTs G2/G3. Tumor size, shape, margin, density, homogeneity, presence of fat, or calcifications, a halo-sign as well as some indirect manifestations were evaluated. Texture analysis features were performed using commercial software. Receiver operating characteristic curve (ROC) was used to determine the sensitivity and specificity of prediction. Results: G2/G3 PTs showed a larger size (>4.0 cm) compared to PTs G1 (64.52 vs. 28.57%, p = 0.025). A strong lobulation or multinodular confluent was more common in G2/G3 PTs compared to PTs G1 (64.52 vs. 14.29%, p = 0.004). Significant differences were also observed in tumors' growth speed and clinical manifestations (p = 0.007, 0.022, respectively). Ten texture features showed significant differences between the two groups (p < 0.05), Correlation_AllDirection_offset7_SD and ClusterProminence_AllDirection_offset7_SD were independent risk factors. The area under the curve (AUC) of imaging-based diagnosis, texture analysis-based diagnosis and the combination of the two approaches were 0.805, 0.730, and 0.843 (90.3% sensitivity and 85.7% specificity). Conclusions: Texture analysis has great potential to improve the diagnostic efficacy of MG in differentiating PTs G1 from PTs G2/G3.


INTRODUCTION
Phyllodes tumors (PTs) are rare breast fibroepithelial neoplasms that account for <1% (1, 2) of all breast tumors and 2-3% of all fibroepithelial breast lesions (3,4). PTs was originally described in 1838 as "cystosarcoma phyllodes" because of their leaf like pattern of growth and internal cystic degeneration. PTs usually showed benign biological manifestations. However, approximately 20-30% of resected PTs are malignant and approximately 25% of malignant ones show metastatic features (5). A prominent and widely accepted grading system has been reported by the World Health Organization (WHO) 3-tiered classification. PTs are classified as benign, borderline, and malignant based on the semiquantitative evaluation of key histologic findings, which include stromal cellularity, stromal atypia, stromal mitosis, and stromal overgrowth (6).
PTs may occur in any age group from adolescents to the elderly but most commonly in women aged between 35 and 55 years (1,4). Surgical resection is the fundamental treatment for PTs. However, surgical approaches are generally selected based on the histologic grade. Wide excision or mastectomy is usually performed in PTs Grade2 (G2)/G3 (7)(8)(9). Therefore, the preoperative differentiation between PTs G1 and G2/G3 would be especially useful for surgery planning. Fine-needle biopsy is considered to be a highly accurate technique in PTs diagnosis. However, it is not proper to be used for PTs grading because of inadequate cytologic samples and the heterogeneous nature of the tissue composition in PTs (10,11).
Various radiologic methods, including mammography (MG), ultrasound (US), and magnetic resonance imaging (MRI) have been used to preoperatively grade PTs (12). The MG and US showed limited potential in predicating PTs grades. MRI may be a useful imaging approach. However, some patients cannot undergo MRI examination because of biomedical metal stents or contraceptive ring implantations, which is very common among Chinese women. In addition, MRI examination is expensive and time consuming. Therefore, surgeons prefer direct operation after receiving US and MG examinations. It would be valuable to find a way to improve the diagnostic performance of MG or US.
Recently, artificial intelligent (AI) technology and radiomics, computer-aided texture analysis has been used for diagnosis, treatment response and prognosis evaluation in cancer patients. However, few studies have used the method of mammography combined with mammographic texture analysis to grade the PTs up to now. The purpose of this study was to determine the diagnostic performance of mammography and mammographic texture analysis in the differentiation between G1 and G2/G3 PTs.

MATERIALS AND METHODS
The Declaration of Helsinki was adhered to throughout the entire study. The protocol was approved by the Institutional Review Board of the Affiliated Hospital of Nanjing University of Chinese Medicine. The need for informed consent was waived by the Institutional Review Board, due to the nature of this retrospective study.

Patients
From February 2010 to October 2017, we obtained data from 80 female patients with surgically proven primary PTs, from our data warehouse. The patients' ages ranged from 25 to 70 years old (mean 46.58 ± 9.54). The inclusion criteria were as follows: (1) patients with surgically proven primary PTs; (2) patients who did not undergo any treatment before surgery; (3) patients who underwent preoperative mammography; (4) with a visible lesion on the mammography images. Finally, 35 cases were excluded due to the absence of MG examination (n = 30) or negative MG findings (n = 5). A total of 45 patients were included in this study (Figure 1). According to the WHO 2012 classification for PTs, the PTs were divided into G1, G2, and G3 in this study. We obtained information about the tumors growth speed by tracking the patient's previous images (including mammography, ultrasound, and MRI) or by asking about the patients feelings. A tumors diameter doubling within half a year is defined as a rapid growth tumor, while the remaining is defined as a slow growth tumor. Tactility was defined as hard like the forehead, medium like the nose and soft like the lips.

Mammography Examinations and Images Analysis
Bilateral digital MG examinations were performed using the GIOTTOIMAGE 3D (IMS, Bologna, ITA), and choosing fully automatic exposure control mode, including the routine craniocaudal (CC), and mediolateral oblique (MLO) views. The dicom images were obtained from the Picture Archiving and Communication Systems (PACS). Two radiologists (>8 years' experience in mammography), who were blinded to pathological findings, analyzed the images. The following imaging information was evaluated: tumor size, margin (welldefined or ill-defined border), shape (oval, weak lobulation, and strong lobulation /multinodular confluent), density (hypodensity, isodensity, or hyperdensity), homogeneity (yes or no), the presence of fat or calcifications, and the presence of a halo-sign (a low density fat ring caused by the tumor pushing against surrounding structures). In addition, some indirect manifestations, including breast composition categories of American College of Radiology (ACR), skin thickening, venectasia, and axillary lymphadenectasis (the short diameter >1 cm) were also evaluated. The size of the tumor was determined based on the maximum diameter either in a CC or MLO image. For quantitative data, we calculated the mean of two readers. For qualitative data, the final imaging features were confirmed when the two readers reached a consensus.

Mammographic Texture Analysis
Region of interests (ROIs) were drawn manually to delineate the lesions using ITK-SNAP software. Since PTs have envelopes and the display rate of a halo-ring is as high as 91.11%(41/45) in this study, we outline ROIs of tumors with a halo-ring as the boundary. All the dicom images and ROIs were individually transferred to the texture analysis software package (Artificial Intelligent Kit-A.K., GE Healthcare). Subsequently, texture features were automatically calculated by the A.K. software package. The texture analysis was performed twice for each lesion, and mean values of texture features were calculated. The procedure is shown in Figure 2. Three categories of statistical methods including Histogram, Gray Level Cooccurrence Matrices (GLCM), and run-length matrix (RLM) were used. A total of 435 texture features were extracted from each image in our study.

Statistical Analysis
Statistical analyses were performed using IBM SPSS version 22.0 (IBM Corporation, New York). Quantitative data were displayed as mean ± SD. The Independent sample t-test and Mann-Whitney U-test was used for data with a normal and abnormal distribution, respectively. Categorical data were shown as a percentage and were analyzed using the Chi-square test or Fisher's exact test. Spearman correlation analysis and Logistic regression was used to show the relationship between texture features and tumor grade. P < 0.05 were considered statistically significant. The Receiver operating characteristic (ROC) curve was adopted to determine the diagnostic sensitivity and specificity of the Mammography and Mammographic texture analysis.

Patients' Clinical Characteristics
The clinical characteristics of the 80 patients are summarized in Table 1. Each patient has only one lesion in the unilateral breast. All patients underwent surgery. There were 21 benign (26.25%), 38 borderline (47.50%), and 21 malignant tumors (26.25%). Fifteen of them underwent local excision, 52 underwent wide   excision and 13 underwent mastectomy. Many PTs G2/G3 rapidly increased (diameter doubling) within half a year compared with PTs G1 (47.46 vs. 14.28%, p = 0.007). PTs G2/G3 were more likely to cause pain and skin changes compared to PTs G1 (p = 0.022). No significant differences were found in stiffness and mobility. Except for 19 lesions growing in the center or occupying the entire breast it was difficult to judge the origins, the location had no significance between these two groups.

Mammography Findings
Subsequently, we evaluated the Mammographic findings of the 45 patients who met the study criteria.  (Figure 4). Some low-grade PTs showed an illdefined margin which was under the influence of the cover effect because of their small size and equal density to the surrounding gland ( Figure 5). There were some limitations in the evaluation of PTs boundaries. There were no significant differences in density, homogeneity, the presence or absence of a halo ring, calcifications and fat between PTs G1 and PTs G2/G3. Similar results were observed for the indirect manifestations ( Table 3). ROC curve was adopted to determine the diagnostic sensitivity and specificity of Mammography findings. AUC was 0.805 with 64.5% of sensitivity and 85.7% of specificity ( Figure 6A).

Mammographic Texture Analysis
Total of 435 texture features were extracted from the mammographic images. Those texture features with significant differences between PTs G1 and PTs G2/G3 are shown in Table 4. Spearman correlation analysis also eliminated some parameters with strong a correlation (Figure 7). Finally, logistic regression showed that only two parameters were retained in our model. They were Correlation_AllDirection_offset7_SD and ClusterProminence_AllDirection_offset7_SD.

Parameter 1:Correlation_AllDirection_offset7_SD
Correlation measures the similarity of the gray levels in neighboring pixels. Correlation_AllDirection_offset7_SD is one of the 18 parameters related to the Correlation in AK Software.

Parameter 2:ClusterProminence_AllDirection_offset7_SD
Cluster Prominence is a measure of a symmetry of a given distribution. High values of this feature indicate that the symmetry of the image is low, in medical imaging low values of cluster prominence represent a smaller peak for the image gray level value and usually the gray level difference between the forms is small.
The texture features were associated with tumor grade (OR = 0.465, 95%CI:0.231-0.936; OR = 0.042, 95CI:0.193-0.969, respectively). ROC curve was adopted to determine the diagnostic sensitivity and specificity of Mammographic texture analysis. The AUC was 0.730. When the cut off value was 0.044, the sensitivity was 93.5%, and the specificity was 50% (Figure 6B).
Subsequently, ROC curve was also adopted to determine the diagnostic sensitivity and specificity of Mammography findings + texture features. The AUC was 0.843 with 90.3% sensitivity and 85.7% specificity for predicting PTs G2/G3 tumors ( Figure 6C).
Finally, we randomly selected 30 samples for internal validation, including nine benign (30%), 13 borderline (43.33%), and eight malignant tumors (26.67%). The AUC was 0.862 (85.7% sensitivity and 77.8% specificity). The verification results are similar to those of previous studies, which prove that the model is relatively stable.

DISCUSSION
Previous studies have indicated that imaging approaches are useful in differentiating PTs G1 from PTs G2/G3. In the present study, we evaluated the role of texture features in grading PTs. Our data indicated that texture features are useful in grading PTs. Moreover, our data indicates that texture analysis can improve the diagnostic performance in differentiating PTs G1 and PTs G2/G3.
Surgical methods are associated with the grades of PTs. The preoperative differentiation would be especially useful for surgery planning. A fine-needle biopsy is an accurate method used  in the diagnosis of PTs but cannot be used for classification, because of inadequate cytologic samples and the heterogeneous nature of the tissue composition (10,11). It would be helpful to evaluate the PTs grades by using imaging approaches. However, the radiologic studies in PTs grading are very few because of the low incidence. Previous US, MG, and MRI studies indicated that a larger tumor size and irregular tumor shape are more common in higher grades of tumors than in lower grade tumors (9)(10)(11)(12)(13)(14)(15). Our data is consistent with those previous findings, and we found that the multinodular confluent was characteristic imaging manifestation of PTs G2/G3. This is related to the degree of leaf-like growth in histology (2). An irregular cyst wall in an MRI, a tumor signal intensity lower than or equal to normal tissue on T2-weighted images and a low apparent diffusion coefficient (ADC) are all significantly correlated with the histologic grade. T1 weighted imaging signal in the G2/G3 PTs was higher than that in the PTs G1 (12).
Recently, texture analysis has been widely used to evaluate tumor heterogeneity. Texture parameters, such as entropy and kurtosis, show good performance in differentiating benign from malignant tumors (16,17). Several studies also indicate that texture features are good predictors of tumor grades (18,19). However, few studies have shown the role of texture features in PTs grading. We were the first one to use the method of mammographic texture analysis to grade the PTs up to now. Significant differences were found in 10 texture features and Correlation_AllDirection_offset7_SD and ClusterProminence_AllDirection_offset7_SD were the independent factors in identifying PTs G1 from PTs G2/G3. In addition, our data also indicated that Mammography can obtain good specificity but poor sensitivity, while texture analysis can obtain high sensitivity but poor specificity in differentiation. Interestingly, the combination of the two approaches can obtain both high sensitivity and specificity. Texture analysis can effectively improve the efficacy of mammography for PTs classification.
There are also several limitations in our study. First, since a mammography is a two-dimensional structural image, the recognition of functional, and three-dimensional structural images is absent, and texture analysis based on mammography may lose a lot of information. Second, as a retrospective study, selection bias cannot be avoided. Third, it is inevitable that the number of patients in this study is small for texture analysis study. There are two main reasons for the small number of cases: (1) The incidence of PTs is low, which only accounts for 1% of breast tumors. It is relatively difficult to collect cases for this. Second, texture analysis research requires a high consistency of Imaging equipment and parameters, in order to ensure the accuracy of texture analysis, some cases have to be excluded from the study. Because of the relatively small sample size, all cases were included for texture feature extraction. Then we performed internal validation to verify the results, aiming to improve the accuracy of the test set as much as possible under existing conditions. Finally, we compared the texture analysis results obtained in this study, with previous literature, and found that the two independent parameters we screened had been reported to have clear statistical significance in the benign and malignant differentiation of breast calcifications and evaluation of chemotherapy efficacy (20,21), which further supported the credibility of the results of this study. In the future, we will expand the sample size to further improve the accuracy and repeatability of the study.
In conclusion, our data indicates that texture analysis based on Mammography has the potential to differentiate PTs G2/G3 from PTs G1. Combining Mammography and texture features can provide optimal predictions in the classification of PTs in mammography.

DATA AVAILABILITY
All datasets generated for this study are included in the manuscript and/or the supplementary files.

CONTRIBUTION TO THE FIELD STATEMENT
Phyllodes tumors (PTs) are rare breast fibroepithelial neoplasms that account for <1% of all breast tumors. PTs are classified as benign, borderline, and malignant. The preoperative differentiation between PTs G1 and G2/G3 would be especially useful for surgery planning. Wide excision or mastectomy is usually performed in PTs Grade2 (G2)/G3. A fine-needle biopsy should not be used for PTs grading because of the inadequate cytologic samples and the heterogeneous nature of the tissue composition. MRI may be a useful imaging approach but has so much contraindication as well as being costly. The MG and US showed limited potential in predicating PTs grades, however, our study first used the method of mammography combined with mammographic texture analysis to grade the PTs. The area under the curve (AUC) of imaging-based diagnosis, texture-based diagnosis and the combination of the two approaches were 0.805(64.5% sensitivity and 85.7% specificity), 0.730 (93.5% sensitivity and 50% specificity), and 0.843 (90.3% sensitivity and 85.7% specificity). Texture analysis can effectively improve the efficacy of mammography for PTs classification.