A hematological parameter-based model for distinguishing non-puerperal mastitis from invasive ductal carcinoma

Purpose Non-puerperal mastitis (NPM) accounts for approximately 4-5% of all benign breast lesions. Ultrasound is the preferred method for screening breast diseases; however, similarities in imaging results can make it challenging to distinguish NPM from invasive ductal carcinoma (IDC). Our objective was to identify convenient and objective hematological markers to distinguish NPM from IDC. Methods We recruited 89 patients with NPM, 88 with IDC, and 86 with fibroadenoma (FA), and compared their laboratory data at the time of admission. LASSO regression, univariate logistic regression, and multivariate logistic regression were used to screen the parameters for construction of diagnostic models. Receiver operating characteristic curves, calibration curves, and decision curves were constructed to evaluate the accuracy of this model. Results We found significant differences in routine laboratory data between patients with NPM and IDC, and these indicators were candidate biomarkers for distinguishing between the two diseases. Additionally, we evaluated the ability of some classic hematological markers reported in previous studies to differentiate between NPM and IDC, and the results showed that these indicators are not ideal biomarkers. Furthermore, through rigorous LASSO and logistic regression, we selected age, white blood cell count, and thrombin time to construct a differential diagnostic model that exhibited a high level of discrimination, with an area under the curve of 0.912 in the training set and with 0.851 in the validation set. Furthermore, using the same selection method, we constructed a differential diagnostic model for NPM and FA, which also demonstrated good performance with an area under the curve of 0.862 in the training set and with 0.854 in the validation set. Both of these two models achieved AUCs higher than the AUCs of models built using machine learning methods such as random forest, decision tree, and SVM in both the training and validation sets. Conclusion Certain laboratory parameters on admission differed significantly between the NPM and IDC groups, and the constructed model was designated as a differential diagnostic marker. Our analysis showed that it has acceptable efficiency in distinguishing NPM from IDC and may be employed as an auxiliary diagnostic tool.


Introduction
Non-puerperal mastitis (NPM) is a relatively rare benign breast entity, accounting for approximately 4-5% of all benign breast lesions (1).However, the incidence and recurrence rates of NPM have rapidly increased in recent years (2,3).NPM is a chronic inflammatory breast disease that is unrelated to pregnancy and lactation; however, its etiology remains unclear (4).Multiple factors have been associated with its occurrence, such as ductal obstruction, autoimmune system abnormalities, and infection (5-7).The course of NPM can be protracted, and some patients experience recurrence even after multiple surgical interventions (8,9).Therefore, accurate diagnosis and timely intervention are crucial for improving the prognosis.
The clinical manifestations of NPM often present as inflammatory nodules or masses lacking the typical signs (10).During the acute phase, patients may exhibit redness, swelling, heat, pain, and fistula formation (11).Both inflammatory masses and malignant tumors can form new blood vessels, leading to a partial overlap in the clinical symptoms and imaging findings between NPM and invasive ductal carcinoma (IDC) of the breast (12,13).Distinguishing between the two is challenging using conventional ultrasound examination.Clinical presentations of NPM during physical examination also closely resemble those of IDC of the breast, making it difficult to differentiate them based on a single indicator, thus increasing the likelihood of misdiagnosis and treatment delay.Histopathological biopsy of breast tissue is currently the only method used for a definitive diagnosis, but its acceptance rate by patients is relatively low.
Exploiting objective hematological parameters and identifying diagnostic biomarkers for differentiation are currently hot topics in the research of various diseases.This is because of the inherent advantages of easy acquisition, cost-effectiveness, and strong repeatability associated with these indicators.Research into the potential connections between these indicators and diseases may lead to the discovery of simpler and more predictive clinical parameters.Various indicators, such as neutrophil to lymphocyte ratio (NLR), platelet to lymphocyte ratio (PLR), and lymphocyte to monocyte ratio (LMR), have been found to be applicable for the auxiliary diagnosis of several diseases (14,15).However, the discriminative capabilities of hematological indicators in distinguishing between NPM and IDC remain unclear.In recent years, the fields of regression and classification models have seen remarkable advancements, offering more accurate predictive tools.
Concurrently, feature selection techniques have gained prominence for streamlining model building and reducing computational complexity (16).These methodologies are pivotal in various applications, ranging from healthcare to finance.In the realm of data science and machine learning, regression and classification models play pivotal roles in addressing a wide array of prediction and decision-making tasks.Regression models are employed for predicting continuous targets, whereas classification models are geared toward discerning discrete categories.These models span a spectrum, ranging from linear regression to deep neural networks, each excelling in distinct contexts.Feature selection methods constitute a critical phase in model construction, aiding in the identification of which features are most essential for a model's performance (17).By eliminating redundant or irrelevant features, feature selection can enhance a model's generalization capabilities, reduce the risk of overfitting, and expedite the training process.This study aimed to analyze the differences in hematological indicators between NPM and IDC, the discriminative abilities of previously reported serological biomarkers of inflammation, and to establish a discriminative diagnostic model based on LASSO and logistic regression or other machine learning methods.

Subjects and study design
We conducted a retrospective review of 89 female patients between May 2012 and February 2023 who underwent biopsy or surgical procedures and were confirmed to have NPM, along with 88 female patients confirmed to have IDC and 86 female patients diagnosed with fibroadenoma (FA).Patients with incomplete clinical, laboratory, or imaging data were excluded.Additionally, patients in the NPM group were screened for other potential causes of breast inflammation, such as breast tuberculosis, fat necrosis, and inflammation due to lactation or pregnancy, and patients with these conditions were excluded from the study.All pathological characteristics of NPM, IDC, and FA were subjected to a doubleblind random review by two pathologists.The NPM cases were diagnosed in accordance with the Chinese Society of Breast Surgery (CSBrS) 2021 practice guidelines (8).This study was approved by the Ethics Committee of Jiujiang No.1 People's Hospital.Considering the non-interventional retrospective nature of this study and the utilization of electronic record data coupled with the anonymization of patient information, informed consent from the patients involved was not required.

Statistical analysis
The statistical software SPSS 23.0 and GraphPad Prism 8.0.2 were utilized for all data analysis.The independent samples t-test was applied to analyze differences in continuous variables between the two groups if the data met a normal distribution; otherwise, the Mann-Whitney U test was performed.LASSO and logistic regression were used to screen the parameters for the construction of the diagnostic model.To assess the model's capability, the receiver operating characteristic (ROC), calibration, and decision curves were plotted using R Project 4.0.2.A two-tailed p value of < 0.05 was considered statistically significant.Machine learning methods such as random forest, decision tree, and SVM were also utilized to construct discriminative diagnostic models.

Comparison of routine laboratory data for the study populations
In line with prior epidemiological investigations, patients with IDC exhibited a notably higher average age compared with patients with NPM.Additionally, the average age of the patients in the FA group was notably higher than that in the NPM group.Blood biochemistry tests revealed that patients with NPM exhibited lower levels of AST, TBIL, IBIL, GLU, Urea, CREA, CK, and LDH than patients with IDC.Regarding routine blood parameters, patients with NPM exhibited higher levels of WBC, PLT, PDW NEU%, NEU, LYM, MON, and EOS, along with lower levels of Hb, MCH, MCHC, and LYM% compared to patients with IDC.When comparing the coagulation indices between the two groups, patients with NPM exhibited higher levels of Fbg and TT.Likewise, NPM patients exhibited significant differences in certain indicators compared to FA patients, as shown in Table 1.

Performance evaluation of derived serological markers for differential diagnosis
We compared the differences in the derived serological markers among the three groups of patients and found that, compared to the IDC and FA groups, patients with NPM exhibited higher levels of NLR, dNLR, SII, AISI, SIRI, and NMR, along with lower levels of AFR (Table 2).ROC curves indicated that, for distinguishing between NPM and IDC, the AUC ROC values for these derived serological markers were all below 0.70, suggesting poor diagnostic performance (Figure 1A).When distinguishing NPM from FA, although the NLR, dNLR, SII, AISI, and SIRI were all above 0.70, they remained below 0.72, indicating a moderate diagnostic performance (Figure 1B).Therefore, it is necessary to explore new biomarkers or models with better discriminatory capability.

Screening of key indicators and construction of models for differential diagnosis
The NPM patients and IDC patients were randomly divided into a training set (NMP, n= 44; IDC, n= 44) and a validation set (NMP, n= 45; IDC, n= 44) in a 1:1 ratio, where the training set was used to build the model, and the validation set was used to verify the accuracy of the model.Using LASSO regression to select key variables for model construction, with the principle of keeping the model concise under lambda compression (lambda.1se),variables with small regression coefficients were directly compressed to zero to eliminate corresponding variables (Figure 1A).To construct models to distinguish NPM from IDC, the key variables used were age, IBIL,Urea, WBC count, LYM%, and TT (Figure 1B).Subsequently, univariate and multivariate logistic regression were performed on these indicators, the parameters of age, WBC, and TT were ultimately selected for model construction.
The ROC curve showed an AUC of 0.912 (Figure 2C), and both the calibration (Figure 2D) and decision curves (Figure 2E) demonstrated that this model was reliable for distinguishing between NPM and IDC.Furthermore, the NPM patients and FA patients were randomly divided into a training set (NMP, n= 44; FA, n= 43) and a validation set (NMP, n= 45; FA, n= 43) in a 1:1 ratio, and using the same key parameter selection method and model construction approach, a model for distinguishing between NPM and FA was constructed (Figure 3A and B).The ROC curve showed an AUC of 0.862 (Figure 3C), and the calibration (Figure 3D) and decision curves (Figure 3E) indicated good performance in distinguishing between NPM and FA.Figures 4A, B listed the performance parameters of the two diagnostic models in the validation set, and the diagnostic performance parameters for model 1 and model 2 in the training and validation sets were shown in Figure 4C.For model 1 used to distinguish between NPM and IDC, it also demonstrated a high AUC of 0.851 in the validation set, while model 2 used to distinguish between NPM and FA achieved an even higher AUC of 0.854 in the validation set.

Using machine learning approaches to construct the diagnostic discrimination models
Using three machine learning methods, random forest, decision tree, and SVM, models were constructed in the training sets.The results showed that for distinguishing between NPM and IDC, the random forest model achieved the highest AUC in the training set at 0.752 and in the validation set at 0.751 (Figure 5A).For distinguishing between NPM and FA, in the training set, the random forest model achieved the highest AUC at 0.820, and in the validation set, it was 0.706 (Figure 5B).In both the training and validation sets, these three machine learning models performed lower than the models established in this study using lasso and logistic regression.

Discussion
NPM is a benign, non-tumorous, non-specific inflammatory breast condition characterized by ductal dilation, extensive infiltration of inflammatory cells, and late-stage ductal and adjacent tissue infiltration and proliferation (20,21).NPM primarily affects non-lactating women aged between 30 and 40 (22).In clinical practice, NPM is relatively uncommon but has been on the rise in recent years.Clinically, only some patients exhibit symptoms such as redness, swelling, and pain, whereas most present with breast lumps accompanied by pain and lack the typical clinical features (23).IDC, the most common type of breast cancer, typically manifests as a hard lump with unclear borders, poor mobility, and some degree of pain.Consequently, there is an overlap in the clinical presentations of NPM and IDC (24,25).Owing to its real-time and noninvasive nature, ultrasound has become an important tool for routine breast examination in women (26, 27).However, two-dimensional and color Doppler ultrasound images of both NPM and IDC can show hypoechoic or mixed-echo masses, indistinct borders, irregular shapes, uneven internal echoes, and ductal dilation, leading to imaging overlap (28).
Interpretation of these imaging features can also be influenced by the physician's subjective experience.Therefore, reliance on ultrasound can make differential diagnosis challenging.Research has shown that NPM is frequently confused with IDC with a high preoperative misdiagnosis rate.Importantly, the treatment approaches for these conditions are markedly different.Therefore, the ability to accurately differentiate NPM from IDC preoperatively is of crucial clinical significance for the diagnosis and treatment of patients with NPM.Numerous studies have demonstrated the diagnostic value of serologically derived biomarkers for various diseases, including cancer.However, it is currently unclear whether hematological markers can serve as discriminative biomarkers to distinguish NPM from IDC. NLR, PLR, LMR, dNLR, AFR, PNI, SII, AISI, NLPR, SIRI, NMR, NER, and other serological markers have previously demonstrated utility in the differential diagnosis and prognostic assessment of various cancers, including breast cancer (29)(30)(31).However, whether these parameters can differentiate between NPM and IDC remains unclear.Based on the ROC curve analysis, we found that these markers had relatively poor discriminative performance and only exhibited moderate performance when distinguishing NPM from FA.This suggests an urgent need to identify new blood parameters with better discriminative diagnostic capabilities.Initial laboratory indicators upon admission and before treatment can, to some extent, reflect the true condition of different diseases.In this study, we assessed differences in laboratory characteristics between patients with NPM and those with IDC at the time of admission.Through statistical analysis, we identified several markers that showed significant differences between groups.These markers can serve as important auxiliary references for differential diagnosis, especially when there is an overlap in imaging or clinical symptoms.However, there are more than 20 differentiating markers, which may not be practical for clinical applications.Using LASSO and logistic regression, we successfully identified the most crucial components for distinguishing between these two diseases: age, WBC count, and TT.Using these three indicators, we successfully constructed a differential diagnostic ROC curves of derived serologic markers for identification of NPM vs. IDC (A) or FA (B).NLR, neutrophil to lymphocyte ratio; PLR, platelet to lymphocyte ratio; LMR, lymphocyte to monocyte ratio; dNLR, derived neutrophil-lymphocyte ratio; AFR, albumin-to-fibrinogen ratio; PNI, prognostic nutritional index; SII, systemic immune-inflammation index; AISI, aggregate index of systemic inflammation; NLPR, neutrophil-tolymphocyte platelet ratio; SIRI, systemic inflammation response index; NMR, neutrophil-monocyte ratio; NER, neutrophil-to-eosinophil ratio; ROC, receiver operating characteristic curves; AUC, area under ROC curve.model, and the ROC curve confirmed an AUC of 0.912, with a sensitivity of 84.09% and a specificity of 86.36%.Our model outperformed the other models.For instance, Tang et al. (12) used the Magnetic Resonance Imaging (MRI) volumetric apparent diffusion coefficient to differentiate between NPM and breast cancer with an AUC of only 0.821, which was lower than the AUC of our model.Similarly, another model using grayscale ultrasound (GSUS) and contrast-enhanced ultrasound (CEUS) images achieved an AUC of approximately 0.80 (32), which was also lower than our study's model, and their model's performance significantly improved when clinical parameters such as age and NEU were incorporated.Our model also included age as a critical factor.Both univariate and multivariate analyses indicated that NPM was more common in younger women, whereas IDC was more prevalent in older women.These findings are consistent with those of previous studies (32,33).
Although the exact etiology of NPM remains unclear, it is a benign inflammatory disease different from IDC (20).Therefore, blood cell indices can serve as indicators of systemic inflammation and can differentiate between NPM and IDC.Our data showed that NPM patients had significantly higher WBC counts than IDC patients.The WBC count is a non-specific marker of inflammation and can indicate active bacterial infection.Patients with IDC rarely present with active bacterial infections.Thus, WBC count can be used to distinguish NPM from IDC.
This study has certain limitations.First, it was a single-center retrospective study with a relatively narrow cohort size owing to the low incidence rate of NPM, and large-scale studies and multicenter research are needed to thoroughly validate the reliability and clinical value of our model in the future.Second, the laboratory indicators included were not sufficiently comprehensive, as some were missing from the IDC cases.Nonetheless, based on these simple indicators, a model with acceptable efficiency was constructed.
In conclusion, we successfully developed a model that can effectively differentiate NPM from IDC.For the clinical differential diagnosis of NPM, this model can serve as an effective adjunct to imaging examinations and may help avoid unnecessary biopsies.Furthermore, incorporating this model may improve the current laboratory diagnostic criteria recommended by the NPM guidelines.

2 3
FIGURE 2 Screening of key indicators for building the model to distinguish NPM from IDC and its performance evaluation using the training set.(A) LASSO logistic regression model; (B) Results of univariate logistic regression and multivariate logistic regression; (C) ROC curves; (D) Decision curve analysis; (E) Calibration curves.

4
FIGURE 4 Performance testing of the models in the validation set and diagnostic performance parameters in the training and validation sets.(A) ROC curves and Calibration curves for model 1 used to distinguish between NPM and IDC in the validation set; (B) ROC curves and Calibration curves for model 2 used to distinguish between NPM and FA in the validation set; (C) Diagnostic performance parameters for model 1 and model 2 in the training and validation sets.

TABLE 1
The biomedical indicators, routine blood parameters and coagulation indicators of NPM, IDC and FA patients.