An ultrasound-based radiomics model to distinguish between sclerosing adenosis and invasive ductal carcinoma

Objectives We aimed to develop an ultrasound-based radiomics model to distinguish between sclerosing adenosis (SA) and invasive ductal carcinoma (IDC) to avoid misdiagnosis and unnecessary biopsies. Methods From January 2020 to March 2022, 345 cases of SA or IDC that were pathologically confirmed were included in the study. All participants underwent pre-surgical ultrasound (US), from which clinical information and ultrasound images were collected. The patients from the study population were randomly divided into a training cohort (n = 208) and a validation cohort (n = 137). The US images were imported into MaZda software (Version 4.2.6.0) to delineate the region of interest (ROI) and extract features. Intragroup correlation coefficient (ICC) was used to evaluate the consistency of the extracted features. The least absolute shrinkage and selection operator (LASSO) logistic regression and cross-validation were performed to obtain the radiomics score of the features. Based on univariate and multivariate logistic regression analyses, a model was developed. 56 cases from April 2022 to December 2022 were included for independent validation of the model. The diagnostic performance of the model and the radiomics scores were evaluated by performing the receiver operating characteristic (ROC) analysis. The calibration curve and decision curve analysis (DCA) were used for calibration and evaluation. Leave-One-Out Cross-Validation (LOOCV) was used for the stability of the model. Results Three predictors were selected to develop the model, including radiomics score, palpable mass and BI-RADS. In the training cohort, validation cohort and independent validation cohort, AUC of the model and radiomics score were 0.978 and 0.907, 0.946 and 0.886, 0.951 and 0.779, respectively. The model showed a statistically significant difference compared with the radiomics score (p<0.05). The Kappa value of the model was 0.79 based on LOOCV. The Brier score, calibration curve, and DCA showed the model had a good calibration and clinical usefulness. Conclusions The model based on radiomics, ultrasonic features, and clinical manifestations can be used to distinguish SA from IDC, which showed good stability and diagnostic performance. The model can be considered a potential candidate diagnostic tool for breast lesions and can contribute to effective clinical diagnosis.


Introduction
Sclerosing adenosis (SA) is a common benign lesion that may mimic breast malignancy clinically, radiologically, and pathologically (1)(2)(3)(4). SA is usually asymptomatic or palpated with a mass, which is unexpectedly found in premenopausal women who have been examined using imaging or histopathology for other reasons (2). SA is often radiologically evaluated as a malignancy. Pathologically, SA is a complex proliferative change consisting of enlarged and twisted nodules and containing repeated and crowded acini accompanied by significant myoepithelial and interstitial fibrosis (5). SA often imitates malignancy, leading to misdiagnosis and excessive biopsies, which have a negative influence on women's physical and mental health. As the most common breast cancer, IDC may coexist with SA, making it difficult to distinguish between them (6). However, surgical resection is the main treatment for IDC due to its invasiveness and metastasis, whereas follow-up procedures are performed for SA (7).
The conventional breast ultrasound (US) plays a key role in screening, diagnostic imaging, and interventional breast surgery for breast lesions. For patients, US is relatively quicker, more comfortable, less expensive, and radiation-free. The American College of Radiology Breast Imaging Report and Data System (ACR BI-RADS) has developed a standardized vocabulary to describe the findings of US examinations, and has established a system to classify these findings and the probability of malignant tumors (8,9). However, US and BI-RADS both depend on the subjective observations of radiologists. Therefore, exploring the use of a non-invasive and objective method to differentiate between benign and malignant lesions is crucial.
Texture analysis technology extracts texture feature parameters by certain image processing systems, which can objectively and quantitatively provide information about the lesions that cannot be identified by the naked eye (10,11). MaZda is a software package used for 2D and 3D image texture analyses, and it provides a complete path for the quantitative analysis of image textures. It is effective in its use for various imaging analyses, including X-rays, US, and magnetic resonance imaging. It has been proven to be an efficient and reliable tool for quantitative image analyses, providing more accurate and objective medical diagnoses (12-15). A logistic regression model is based on a multivariate regression analysis, integrating multiple predictors and using multiple indicators to diagnose or predict the occurrence or progress of diseases (16, 17). To our knowledge, there is no model based on an ultrasonic texture analysis used to distinguish between SA and IDC. We aimed to develop and validate an ultrasound-based radiomics model to differentiate between SA and IDC, which could be a potential candidate diagnostic tool for breast lesions and could help to avoid misdiagnosis and unnecessary biopsies.

Study population
This retrospective study was approved by the Research Ethics Committee of the First Affiliated Hospital of Guangxi Medical University. We retrospectively reviewed the medical records of 345 consecutive female patients (345 lesions) in our hospital from January 2020 to March 2022, including 76 cases of SA and 269 cases of IDC. Patients from the study population were randomly divided into a training cohort (n=208, mean age: 51.3 ± 12.2 years) and a validation cohort (n=137, mean age: 51.5 ± 10.2 years). The consistency between the two cohorts was tested. In addition, patients from our hospital from April 2022 to December 2022, including 26 cases of SA and 30 cases of IDC, were included for independent validation (n = 56, mean age: 48.3 ± 13.6 years).
The inclusion criteria were as follows: (1) a breast US was performed before biopsy or surgery; (2) US images were available for qualitative and radiomic analysis; (3) all participants were confirmed as SA or IDC by biopsy or surgical pathology; (4) all patients had not received systemic hormone therapy or neoadjuvant chemotherapy; (5) the clinical information and US images were complete; and (6) only a lesion in the largest or highest BI-RADS category was included for patients with multiple lesions.
The exclusion criteria were as follows: (1) the poor quality of ultrasonic images affected the texture analysis; (2) the pathological result was indefinite; (3) patients had received systemic hormone therapy or neoadjuvant chemotherapy; (4) clinical information and US images were lacking; and (5) the lesion was too large to delineate the ROI.
The flow chart of the study was shown in Figure 1.

Breast ultrasound technology
All patients underwent a pre-surgical US examination. The patients were in a supine position with their hands raised above their heads to fully expose the breast. Color Doppler ultrasound instruments included GE LOGIQ E9, VOLUSON E9 (General Electric Company, Boston, USA), or HITACHI ARIEETTA 70 (HITACHI Ltd., Tokyo, Japan) with a linear array probe and a frequency of 9-12 MHz.
The standard store images of breast lesion included at least two vertical sections, one of which showing the maximum diameter of the lesion. The images with the clearest and most complete demonstration of lesions were chosen. The focus was located slightly below the lesion, and the frequency range was 9-12MHz. Each lesion was classified into a category (3, 4A, 4B, 4C, or 5) according to the 5th edition of ACR BI-RADS US. According to ACR BI-RADS classification, BI-RADS 4A means that the degree of malignancy is very low, and the possibility of benign lesions is far greater than that of malignant lesions. According to relevant literature, lesions of BI-RADS 3 or 4A were considered to be negative, and lesions of BI-RADS 4B, 4C or 5 were considered to be malignant in our study (18). The ultrasonic features of the breast lesions were recorded, including maximum size, shape, echo pattern, echo distribution, boundary, orientation, posterior feature, calcification, vascularity distribution, and associated features. All lesions were examined and evaluated by two ultrasound doctors with more than five years of experience with breast US. In the case of a disagreement, a final consensus was reached through a discussion.
The maximum size was the largest diameter of the tumor. The shape was defined as regular or irregular. The echo pattern was divided into hypoechoic, or complex echo. The echo distribution was divided into uniform or non-uniform types. The boundary was interpreted as well-circumscribed or obscure. The orientation was depicted as whether or not the breast lesion was parallel to the chest wall. The posterior acoustic features were classified as attenuated or not. The vascularity distribution was recorded as absent or internal (1). Associated features included duct ectasia, and palpable mass.

Pathological findings
The histopathological results of all lesions were obtained from the surgical resection report. Each specimen was placed in a formalin solution, and then histopathological treatment was carried out using the standard procedures. The final pathological results were evaluated by experienced pathologists.

Radiomic analysis
The section of the largest diameter of the lesion was selected to draw ROI by one ultrasound doctor with more than ten years experience of breast US. ROI was set to be 0.1-0.2cm along the inner edge of the lesion. The ultrasound gray-scale images were imported into MaZda software (Version 4.2.6.0), and the ROI results were then delineated manually ( Figure 2). After normalization, a total of 279 descriptors were used to characterize the gray-scale image texture using MaZda software, including nine texture features based on the histogram, 11 features based on the co-occurrence matrix (derived from 20 co-occurrence matrices produced for four directions and five inter-pixel distances), five features based on the run-length matrix (each in four different directions), five features based on a gradient map, five features based on an autoregressive model, and up to 20 features based on the Haar wavelet transform (12).
In order to select the features with good reproducibility and stability to build the model, 30 ultrasound images of breast lesions were randomly selected. The ROI was drawn by another ultrasound doctor with more than ten years experience of breast US and the features were extracted again. Intragroup correlation coefficient (ICC) was used to evaluate the consistency between the ROI extraction features, which was drawn by two ultrasound doctors. The features with ICC greater than or equal to 0.75 were considered to have good reproducibility and stability. The least absolute shrinkage and selection operator (LASSO) logistic regression and cross-validation were performed to select the significant features. The selected features were used to establish the radiomics score.

Development and validation of the model
We conducted univariate and multivariate logistic regression analyses to explore the influencing factors. The candidate factors included clinical information, ultrasonic features, BI-RADS, and the radiomics score. In the training cohort, variables selected by the univariate analysis (p<0.05) were used for the multivariate logistic regression to determine the independent risk factors for the model. On The flow chart of the study. the basis of the validation cohort, the discrimination, calibration, and clinical usefulness of the model were evaluated. In addition, the logistic score of each patient in the independent validation cohort was calculated using our model. The ROC curves were plotted to assess the diagnostic performance of the model (19). The area under the ROC curve (AUC) was used to quantify discrimination. The calibration curve was used to examine the model's predictive accuracy. To determine the clinical usefulness of the model, a decision curve analysis (DCA) was performed (20). Leave-One-Out Cross-Validation (LOOCV) was used to test the stability of the model, which was graded as very good (Kappa value of 0.80 to 1.00), good (Kappa value of 0.60 to 0.80), fair (Kappa value of 0.40 to 0.60), moderate (Kappa value of 0.20 to 0.40) or poor (Kappa value<0.20).

Statistical analysis
The statistical analysis was conducted using R software (version 4.1.3) and SPSS 26.0 (Chicago, IL). For the categorical variables, the Chi-square test was used, although when necessary, Fisher's exact test was used. The Student's t-test was used to compare the continuous variables with a normal distribution. The reported statistical significance levels were all two-sided, and a P value< 0.05 was considered significant.
The "caret" package of R software was used to randomly split the total data, 60% of which was included in the training cohort and the remaining 40% in the verification cohort. At the same time, the package was also used for cross-validation. The "glmnet" package was used for the LASSO regression. The "glm" function of R software was used for the logistic regression analysis. The "Cairo" package was used to plot the model. The "pROC" package was used to plot the ROC curves and to measure the AUCs, which were compared using DeLong's test. The "calibrate" function was used for the calibration curves. The "decision_curve" function was used to perform the DCA.

Clinical and ultrasonic characteristics
The clinical and ultrasonic characteristics of the training cohort and the verification cohort were shown in Table 1. There were no statistical differences in 14 observation indexes (p>0.05) between the training cohort and the verification cohort, which indicated that the consistency between the two cohorts was good.

Radiomic analysis
Based on the training cohort, we extracted 279 texture features for each ROI. According to the result of reproducibility analysis by two ultrasound doctors, 250 radiomic features had good consistency (ICC ≥ 0.75). Through the LASSO regression (Figure 3), the following six optimal variables were selected: Skewness, H o rz l _ R L N o n U n i , H o r z l _ G LevNo n U, Wa vE nLL_ s.3, WavEnLH_s.3, and WavEnLH_s.4. Based on these six features, the radiomics score was calculated using the following formula:

Development and validation of the model
In the training cohort, a univariate analysis was performed on 14 observation indexes ( Table 2). A multivariate logistic regression was used to analyze the selected variables (p<0.05) to determine the independent risk factors for the model ( Table 2). Based on radiomics score, BI-RADS and palpable mass as independent risk variables (p<0.05), the logistic regression model was established by the following function (Table 3) The diagnostic performances of the model and the radiomics scores were verified by the ROC analysis ( Figure 5). The AUC was used to quantify discrimination. In the training cohort, the AUC of the model and the radiomics score were 0.978 (95% confidence interval [CI: 0.960-0.997]) and 0.907 (95% confidence interval [CI:   The specificity, sensitivity, accuracy, Youden index, negative predictive value, positive predictive value, false positive rate, true positive rate, true negative rate and false negative rate of the model and the radiomics score in the training cohort, the validation cohort, the total dataset and in the independent validation cohort were shown in Table 5, respectively. The Brier score of 0.066 suggested a high accuracy of the model. The calibration curve demonstrated good agreement between the prediction and the pathological results ( Figure 6). The DCA was plotted for the model (Figure 7). It demonstrated that if the threshold probability is more than 5%, using the model to predict SA and IDC will be more beneficial than either the treat-all-patients scheme (assuming all lesions are IDC) or the treat-none scheme (assuming all lesions are SA). Based on Leave-One-Out Cross-Validation, the Kappa value of this model was 0.79, which proved that the model had good stability.
According to the model, the lower the radiomics score, the higher the BI-RADS classification, the more palpable the mass, and the greater the possibility of IDC.

Discussion
We developed and validated an ultrasound-based radiomics model, which included the radiomics score, BI-RADS and palpable mass, to distinguish between SA and IDC. Although the radiomics score we created was proved to have a high AUC value, the model showed a better diagnostic efficacy and clinical utility than the radiomics score alone, which indicates the superiority of the model in disease identification.
SA is an IDC-mimicking benign proliferative breast lesion, which is usually asymptomatic or only palpated with a mass. In previous FIGURE 4 The nomogram was established based on the model. studies, it has been confirmed that SA can imitate IDC clinically, radiologically, and pathologically, so it is necessary to distinguish between SA and IDC (1-5, 7). As a convenient, affordable, and radiation-free imaging examination, the US is a most widely used breast screening technique. Liu et al. found that US BI-RADS atlas and elastography are powerful tools in diagnosing SA (1). Shao et al. asserted that an enhanced US could improve the diagnostic accuracy of SA (23). However, these researchers used a subjective analysis or expensive inspections. The texture analysis is a new computer-aided technology used for quantitative analyses of image information through algorithms, which can prevent the subjectivity of ultrasonic examinations and BI-RADS classifications (10,11). To our knowledge, no research has focused on ultrasonic omics to distinguish between SA and IDC using a texture analysis. We selected six radiomic features based on a regression analysis, including one histogram parameter (Skewness), two grey level runlength matrix (RLM) parameters (Horzl_RLNonUni and Horzl_GLevNonU), and three Haar wavelet transform parameters (WavEnLL_s.3, WavEnLH_s.3, and WavEnLH_s.4). The texture analysis was normalized by MaZda software. According to the coefficients, Skewness, Horzl_RLNonUni, and Horzl_GLevNonU were negatively correlated with the radiomics score. That is, the larger the Skewness, Horzl_RLNonUni, and Horzl_GLevNonU, the lower the radiomics score and the higher the probability of IDC. In addition, the three Haar wavelet transform parameters were all positively correlated with the radiomics score, which indicates that when these three parameters are larger, the radiomics score is higher and the probability of IDC is lower. Furthermore, skewness seemed to contribute most to the radiomics score.
The histogram is computed based on the intensity of the pixels without considering any spatial relations between the pixels within the image (12). As one characteristic variable of a histogram, a high skewness means an asymmetrical distribution with a long right tail. A tumor with a high skewness of signal intensity is mainly composed of fibrosis or stroma. In this study, skewness was positively correlated with the malignant degree of the tumor, which may be related to the high gray intensity of the image caused by hyperplasia, fibrosis, calcification, and tumor cell accumulation in the IDC glands. Previous studies have shown that a high mammographic density independently predicts the risk of breast cancer and that a high skewness of a tumor might be related to poor survival (24)(25)(26). Our observations were consistent with these previous reports. On a graylevel image, the RLM quantifies the coarseness of a texture in a specific direction. When runs are equally distributed throughout the gray levels, the function of gray-level non-uniformity reaches its lowest values. If the runs are equally distributed throughout the lengths, the function of run length non-uniformity has a low value (27). In our study, Horzl_RLNonUni and Horzl_GLevNonU were negatively correlated with the radiomics score, which meant that the gray levels and the lengths of IDC were nonuniform. This is consistent with our observation of the IDC ultrasonic features. The wavelet transform provides time/space and frequency (or scale) resolution information of the signal/image and the details of the image at different frequencies, which reflects the detailed features of the image. When the image is clearer or the frequency is richer, the parameter value is higher. The Haar wavelet has mainly been used for the feature extraction of breast cancer diagnoses in many studies (28).In this study, the selected three Haar wavelet transform parameters were all positively correlated with the radiomics score, which meant that the IDC texture images were blurred. This may be due to the heterogeneity of IDC cells and the proliferation of tumor blood vessels, which are prone to necrosis and make the tumor image blurry. Despite the promising performance of the radiomics score, the model of our study, which combined ultrasonic characteristics, BI-RADS, clinical information, and radiomic features, had the advantages of being affordable and objective, suggesting that it is beneficial to combine a texture analysis with ultrasonic features and clinical manifestations in future medical work. Based on the univariate logistic regression, each index was gradually fitted, and three characteristics were screened out as indicators to distinguish between SA and IDC. Soo-Yeon Kim et al. proposed that BI-RADS 4B or 5 was independently related to malignant tumors, and had a high upgrade rate (29). Based on our findings, BI-RADS 3 or 4A suggests that SA is possible, and a higher classification tends to be malignant. A palpable mass with a lower radiomics score further suggests IDC. The results were basically consistent with previous research conclusions (1,29,30). In addition, based on the multivariate logistic regression analysis, the influence of confounding factors was eliminated, and the final three variables were obtained, including the radiomics score, BI-RADS and palpable mass, which were used as independent influence factors and were selected to develop the model.
There are some limitations of the current study that need to be further investigated. (1) This study was a retrospective analysis, therefore it was difficult to completely overcome the operator dependency of the initial examination, making a bias error inevitable. (2) This study was a single-center research study, so the Calibration curve for the model in the training cohort (A), validation cohort (B), total dataset (C) and independent validation cohort (D), respectively. Decision curve analysis for the model and radiomics score.
number of SA and IDC cases was limited. The performance of this model needs to be verified by other centers and a larger cohort in the future. (3) We only included patients with SA and IDC, though the differences in the texture features for the pathological subtypes of breast cancer and adenosis can be analyzed in the future.

Conclusion
The model in our study based on radiomics, ultrasonic features, and clinical manifestations can be used to distinguish SA from IDC, which showed good stability and diagnostic performance. The model can be considered a potential candidate diagnostic tool for breast lesions and can contribute to effective clinical diagnosis and treatment.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.