A Comparative Assessment of MR BI-RADS 4 Breast Lesions With Kaiser Score and Apparent Diffusion Coefficient Value

Objectives To investigate the diagnostic performance of the Kaiser score and apparent diffusion coefficient (ADC) to differentiate Breast Imaging Reporting and Data System (BI-RADS) Category 4 lesions at dynamic contrast-enhanced (DCE) MRI. Methods This was a single-institution retrospective study of patients who underwent breast MRI from March 2020 to June 2021. All image data were acquired with a 3-T MRI system. Kaiser score of each lesion was assigned by an experienced breast radiologist. Kaiser score+ was determined by combining ADC and Kaiser score. Receiver operating characteristic (ROC) curve analysis was performed to evaluate the diagnostic performance of Kaiser score+, Kaiser score, and ADC. The area under the curve (AUC) values were calculated and compared by using the Delong test. The differences in sensitivity and specificity between different indicators were determined by the McNemar test. Results The study involved 243 women (mean age, 43.1 years; age range, 18–67 years) with 268 MR BI-RADS 4 lesions. Overall diagnostic performance for Kaiser score (AUC, 0.902) was significantly higher than for ADC (AUC, 0.81; p = 0.004). There were no significant differences in AUCs between Kaiser score and Kaiser score+ (p = 0.134). The Kaiser score was superior to ADC in avoiding unnecessary biopsies (p < 0.001). Compared with the Kaiser score alone, the specificity of Kaiser score+ increased by 7.82%, however, at the price of a lower sensitivity. Conclusion For MR BI-RADS category 4 breast lesions, the Kaiser score was superior to ADC mapping regarding the potential to avoid unnecessary biopsies. However, the combination of both indicators did not significantly contribute to breast cancer diagnosis of this subgroup.


INTRODUCTION
Worldwide, breast cancer is the most frequently diagnosed malignant tumor in women and is currently the cause of most cancer-related death (1,2). Dynamic contrast-enhanced (DCE) MRI is an effective tool in distinguishing malignant and benign breast lesions with high sensitivity (3)(4)(5). The American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS) lexicon can provide a standardized and structured description for breast lesions (6). While the BI-RADS category 4 lesions are suspicious of malignancy, they could be recommended for biopsies. The probability of malignancy of BI-RADS 4 lesions varies from 2% to 95% (5,7), indicating that a large number of benign lesions would receive unnecessary invasive procedures. This will increase the psychological and financial burden for patients. It is necessary to explore a new problem-solving method to improve the diagnostic performance in the assessment of BI-RADS 4 breast lesions.
As a clinical decision rule, Kaiser score (a clinical scoring system) incorporating several BI-RADS diagnostic criteria has demonstrated robust performance in the assessment of breast lesions with excellent sensitivity and specificity (8)(9)(10)(11)(12)(13), which could potentially avoid unnecessary biopsies. The Kaiser score consists of 11 rating categories ranging from 1 to 11, with each category corresponding to a distinct likelihood of malignancy (13). If the score exceeds 4, a biopsy is recommended (8,9).
Diffusion-weighted imaging (DWI) has been widely used for the assessment of breast disease (14)(15)(16). The apparent diffusion coefficient (ADC) value derived from DWI data can quantitatively reveal the microstructure changes in biological tissues (16,17). In general, the ADCs in benign lesions were significantly higher than that of malignant ones (14,18). Consequently, the findings with high ADCs (greater than a cutoff value) could potentially be regarded as benign lesions, which may avoid unnecessary interventions (15,19). Baltzer et al. (15) pointed out that ADC >1.4 × 10 −3 mm 2 /s was considered an effective method for the exclusion of malignancy with a sensitivity of 100%. Clauser et al. (19) found that application of the ADC cutoff value (1.5 × 10 −3 mm 2 /s) could downgrade the BI-RADS 4 lesions and potentially reduce unnecessary biopsies by 32.6%.
Applying Kaiser score (8,9,20) and ADC (19,21,22) to reduce unnecessary breast biopsies have been independently validated. Recently, a multicentric study reported that combining these two parameters did not improve diagnostic performance when evaluating breast lesions (10). The study included the breast lesions initially assigned as BI-RADS 0, 4, or 5 at mammography and/or breast ultrasonography. We wonder whether integrating both indicators would improve the diagnostic performance in the assessment of BI-RADS 4 breast lesions on CE-MRI.
Therefore, the purpose of this study was to assess the diagnostic performance of the combination of ADC and Kaiser score for MR BI-RADS 4 breast lesions and to compare it with the diagnostic performance of Kaiser score alone. In addition, the effects of background parenchymal enhancement (BPE) on the performance of the combined indicator were also investigated.

Study Population
This retrospective study was approved by our Institutional Review Board, and written informed consent was waived. All patient data were obtained from Picture Archiving and Communication Systems (PACS) and Electronic Medical Record System (EMRS) at our institution. From March 2020 to June 2021, we consecutively reviewed 623 female patients who underwent MRI examinations. Three hundred eighty of these patients were excluded because of the following reasons: (1) receiving chemotherapy or surgery treatment before MR examination (n = 202); (2) lesions assigned as BI-RADS category 2, 3, or 5 at DCE MRI (n = 167); (3) without available histopathological results (n = 10); and (4) borderline tumor (n = 1). Finally, a total of 243 patients with 268 lesions were included in our study ( Figure 1).

MRI Protocol
All image data were acquired with a 3-T MRI system (SIGNA Pioneer, GE Healthcare, Waukesha, WI, USA) with an eightchannel breast coil. All patients were examined with the state-ofthe-art MRI protocol (3) in the prone position. Detailed parameters are summarized in Table 1.

Reference Standard
The histopathology of all lesions was regarded as the reference standard. All lesion specimens were obtained by biopsy or surgery, which were subsequently analyzed by two board-certified pathologists (with 5 and 10 years of experience, respectively).

Image Analysis
The BI-RADS categories, BPE, and lesion size of all lesions were extracted from the structured radiology reports. To determine the Kaiser score category, one experienced breast radiologist (QX, with 15 years of experience in reading breast MR images) was required to interpret all examinations according to the Kaiser score system, which consisted of five independent diagnostic criteria [root sign, time-signal intensity curve (TIC) types, lesion margins, internal enhancement patterns, and peritumoral edema], as investigated in previous studies (10,11,13,20,23,24). Subsequently, another experienced breast radiologist (LL with 18 years of experience in reading breast MR images) randomly evaluated 100 consecutive cases for assessing interobserver consistency. Both readers were blinded to histopathological characteristics and BI-RADS categorization. The final Kaiser score category of each lesion was calculated and recorded. The flowchart of the Kaiser score system is shown in Figure 2.
To measure the ADC of each lesion, all ADC maps were retrieved and transmitted to a dedicated workstation (AW 4.7, GE Healthcare). We used the third method for breast tissue selection as categorized in a meta-analysis by Wielema et al. (25). Two breast radiologists (QX and LL) independently drew regions of interest (ROIs) on the ADC map using the DCE-MRI as references. The ROI included solid areas of the lesion, while the areas with visible necrosis, cystic change, or hemorrhage were excluded. Ultimately, the corresponding ADC was documented. In this study, the average ADCs measured by the two readers were regarded as the final data.
As reported in previous studies (10,13), the ADC was combined with the Kaiser score to obtain the indicator Kaiser score+. To choose the most effective threshold of ADC for Kaiser score+, we tested four thresholds and found that 1.4 × 10 −3 mm 2 /s was the optimal one, which was the same as reported in the current literature (10). Then, if the ADC of a lesion exceeded 1.4 × 10 −3 mm 2 /s, the Kaiser score (threshold >4) was reduced by 1 point (this combination method could give the best diagnostic performance through our test). Otherwise, the Kaiser score stayed the same. The details of the procedure in finding the best combination manner for Kaiser score+ are shown in Supplementary Material 1.

Statistical Analysis
The data were analyzed using SPSS 26.0 (IBM) and MedCalc 19.8 (MedCalc Software). The intraclass correlation coefficient (ICC) (26,27) was used to analyze interobserver consistency. In this study, the quantitative data that did not conform to normal distribution were expressed in median and interquartile range (IQR) and were compared by using the Mann-Whitney U test. Categorical data were analyzed by using the chi-squared test or Fisher's exact test. The receiver operating characteristic (ROC) curves were plotted to determine the performance of each parameter. Regarding subgroup analysis, the impact of BPE (minimal, mild, moderate, and marked, respectively) on the diagnostic performance of all quantitative parameters was also investigated. The DeLong test was performed to test the differences between independent areas under the ROC curves (AUCs). In this study, the cutoff values (Kaiser score >4; ADC ≤1.4 × 10 −3 mm 2 /s) were applied (22,24). Discrimination parameters consisting of sensitivity and specificity were calculated and compared by using the McNemar testing. p < 0.05 was considered statistically significant.  Table 2.

Interobserver Agreement
The ICC was 0.9126 (95%CI, 0.8702-0.9412) for Kaiser score and 0.9972 (95%CI, 0.9964-0.9978) for ADC. Therefore, the Kaiser score and the ADC measured by the two readers showed excellent agreement.

Comparison of the ROC Curves
For all lesions, the AUCs of Kaiser score+ (AUC, 0.906) and Kaiser score (AUC, 0.902) were higher than that of ADC (AUC, 0.810) (p = 0.002, p = 0.004, respectively), while there was no significant difference in the AUCs between Kaiser score+ and Kaiser score (p = 0.134) ( Table 4). For the subgroup, such as BPE 4 (marked), the difference in AUCs (Kaiser score+ vs. ADC; Kaiser score vs. ADC) was significant (p= 0.004, p = 0.007, respectively), while the remaining subgroups were not (all p > 0.05). Of note, both Kaiser score+ and Kaiser score showed satisfactory diagnostic performance, regardless of BPE ( Figure 3 and Table 4).

Sensitivity and False-Negative Lesions
For all lesions, the sensitivities of Kaiser score+ and ADC were lower but not significantly lower than that of Kaiser score (p = 1.000, p = 0.344, respectively) ( Table 5). Six breast cancers (three mucinous carcinomas, one ductal carcinoma in situ, one invasive ductal cancer, and one medullary carcinoma) were missed using the Kaiser score. Simultaneously, 10 false-negative findings (5 mucinous carcinomas, 3 ductal carcinomas in situ, 1 metaplastic carcinoma, and 1 papillary carcinoma) were diagnosed applying the ADC. Of these 16 malignant cases, 3 mucinous carcinomas were missed by both Kaiser score and the ADC. The details of false-negative lesions are provided in Table 6. A clinical example is provided in Figure 4.

DISCUSSION
In this study, we found that the Kaiser score was superior to ADC mapping regarding the potential to avoid unnecessary biopsies for MR BI-RADS category 4 breast lesions. Potentially, this rate could even be increased by combining ADC value and Kaiser score, however, at the price of a lower sensitivity. The differences between Kaiser score and Kaiser score+ did not show statistical significance. DWI is a kind of functional imaging technology that has been widely used to improve the diagnostic accuracy of breast MRI (14)(15)(16). DWI can quantitatively assess water diffusion in breast tissue by calculating ADC (16,17). In malignant breast lesions, the signal of DWI increases, and corresponding ADC decreases due to the proliferation of tumor cells, compressed extracellular space, and the hindered diffusion, as shown in this study. As a new diagnostic tool, quantitative ADC is a promising marker in the assessment of breast lesions (28).
Kaiser score is a clinical decision rule that integrates the five most common diagnostic features: root sign, TIC types, lesion margins, internal enhancement patterns, and peritumoral edema (10,11,13,20,23,24). We also tested the effectiveness of each feature in the Kaiser score. Further details are shown in Supplementary Material 2. Multivariable logistic regression analysis was performed to validate that all of these characteristics except for margins were significantly and independently associated with a breast cancer diagnosis. Moreover, we found that the regression model showed no statistical difference for diagnostic performance in comparison with the Kaiser score. This might explain why the diagnostic performance of the Kaiser score was robust. That might also be the reason that both Kaiser score and Kaiser score+ showed satisfactory diagnostic performance between all BPE subgroups. It is a simple and practical tool for those breast radiologists who need to read images varying in quality. Kaiser score value ranges from 1 to 11, each of which is associated with a distinct probability of malignancy (13). If the score exceeds 4, a biopsy is needed (8,9). This has been validated in non-mass enhanced lesions on MRI (29), suspicious MRI-only lesions (11), and in lesions that present as mammography-related calcifications (8).
Both ADC and Kaiser score could be regarded as useful imaging biomarkers to benefit clinical decision-making in managing BI-RADS 4 lesions. We analyzed the potential to avoid unnecessary biopsies by using ADC with a threshold of 1.4 × 10 −3 mm 2 /s and obtained a higher specificity (47.6%) compared with the reports (32.9%) by Dietzel et al. (10). The possible reasons for this result were as follows: first, this study was performed in our single institution, and all data were acquired with one protocol, which may cause the overestimation. Second, we did not analyze the small lesions in  For all lesions, the Kaiser score and Kaiser score+ showed a similar degree of sensitivity (p = 1.000). The diagnostic sensitivity showed no significant difference between Kaiser score and ADC (p = 0.344). Compared with the ADC, the KS acquired a significantly higher specificity (p < 0.0001). Values are given as percentages, absolute numbers in brackets. TP, true positives; TN, true negatives; FP, false positives; FN, false negatives; CI, confidence interval; ADC, apparent diffusion coefficient. *Given as ×10 −3 mm 2 /s. this study. When measuring the ADC values of the small lesions, the outlined ROIs may include normal breast tissues, which may lead to higher ADC values, thus the lower performance of ADC. Therefore, excluding such lesions may lead to overestimated results. It is very necessary to be careful about the standardization of ADC. Simultaneously, the above discrepancies may be also related to our higher specificity for Kaiser score+. Our study has done verification of the work but especially focuses on the MR BI-RADS 4 lesions. More critical evaluation will be considered in follow-up work, including studying as many systems, protocols, and centers as we possibly can. Analogous to Dietzel et al. (10), our results showed that the sensitivity with Kaiser score+ also decreased. The missed lesion was a rare finding that exhibited atypical morphological patterns of a malignant lesion. We reviewed the histopathology and found that the lesion was initially diagnosed as carcinosarcoma and finally as metaplastic carcinoma. Metaplastic carcinoma frequently presents with myxoid matrix, intratumoral hemorrhagic changes, and loose edematous stroma (30,31), which may affect the ADC value. For the same reasons, the morphological features based on DCE-MR may be considered benign. In this study, the missed lesion showed heterogeneous internal enhancement, delayed plateau enhancement, and irregular margins. The Kaiser score was assigned as category 5. The ADC value was 1.57 × 10 −3 mm 2 /s, which was consistent with the reports in the literature (32). Therefore, the Kaiser score+ would lead to the falsenegative diagnosis.
Previous studies have confirmed that a high level of BPE was associated with breast cancer (33), and the strong BPE may lead to false-negative or false-positive diagnoses (34)(35)(36). However, our results demonstrated that the diagnostic performance of Kaiser score in the assessment of BI-RADS 4 breast lesions did not differ depending on BPE (all p > 0.05). Consequently, we speculate that the Kaiser score may provide guidance in cases despite BPE.
Our study also has some limitations. (1) This was a retrospective study conducted at our single institution, and all data were acquired with one protocol, which may lead to the overestimated results. The datasets from multicenters will be prospectively assessed in further research. (2) We did not evaluate the lesions categorized as foci (size <5mm), which might lead to an overestimated performance of the ADC value. The ADCs of foci may be affected by the partial volume effect, and this area warrants further investigation. Previous studies exhibited that the Kaiser score could be applied for assessing foci (9, 10). (3) When measuring the ADC, we outlined the ROI on twodimensional images and avoided visible necrosis, cystic, or hemorrhagic areas, which might ignore the influence of lesion heterogeneity.

CONCLUSION
The Kaiser score is superior to ADC mapping regarding the potential to avoid false-positive biopsies for MR BI-RADS category 4 breast lesions. Potentially, this rate could even be increased by adding ADC measurements in the KS+, however, at the price of lower sensitivity. The combination of both indicators did not significantly contribute to breast cancer diagnosis.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board of Third Affiliated Hospital of Zhengzhou University (Zhengzhou, China). Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements. Written informed consent was not obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
LM: conceptualization, software, data curation, formal analysis, data interpretation, manuscript drafting, and manuscript editing. KW, LL, and QX: conceptualization, data curation, data interpretation, manuscript drafting, and manuscript editing. HS, YC, YG, MH, and YS: data curation, data interpretation, analysis, and manuscript editing. XiaZ and XinZ: conceptualization, formal analysis, data curation, data interpretation, and manuscript editing. All authors contributed to the article and approved the submitted version.