Prediction models for chronic postsurgical pain in patients with breast cancer based on machine learning approaches

Purpose This study aimed to develop prediction models for chronic postsurgical pain (CPSP) after breast cancer surgery using machine learning approaches and evaluate their performance. Methods The study was a secondary analysis based on a high-quality dataset from a randomized controlled trial (NCT00418457), including patients with primary breast cancer undergoing mastectomy. The primary outcome was CPSP at 12 months after surgery, defined as modified Brief Pain Inventory > 0. The dataset was randomly split into a training dataset (90%) and a testing dataset (10%). Variables were selected using recursive feature elimination combined with clinical experience, and potential predictors were then incorporated into three machine learning models, including random forest, gradient boosting decision tree and extreme gradient boosting models for outcome prediction, as well as logistic regression. The performances of these four models were tested and compared. Results 1152 patients were finally included, of which 22.1% developed CPSP at 12 months after breast cancer surgery. The 6 leading predictors were higher numerical rating scale within 2 days after surgery, post-menopausal status, urban medical insurance, history of at least one operation, under fentanyl with sevoflurane general anesthesia, and received axillary lymph node dissection. Compared with the multivariable logistic regression model, machine learning models showed better specificity, positive likelihood ratio and positive predictive value, helping to identify high-risk patients more accurately and create opportunities for early clinical intervention. Conclusions Our study developed prediction models for CPSP after breast cancer surgery based on machine learning approaches, which may help to identify high-risk patients and improve patients’ management after breast cancer.


Introduction
Breast cancer is the most common cancer in women. Although the ten-year survival rate of breast cancer has reached 82% (1,2), there are still 20% to 60% of surviving patients experiencing chronic postsurgical pain (CPSP) after breast cancer surgery, resulting in a reduced quality of life and functional impairments (3)(4)(5). Predicting the risk of CPSP after breast cancer surgery can help clinicians identify those with a higher risk of CPSP and leading to earlier therapeutic interventions. In addition, identifying patients with a lower risk of CPSP could also prevent unnecessary therapy, saving limited medical resources.
Numerous factors have been found to be associated with CPSP after breast cancer surgery in the past decade, including socialdemographic, intraoperative, and postoperative factors (6)(7)(8). Several models to predict CPSP have also been developed, mostly in European breast cancer patients (9)(10)(11)(12). However, despite the acceptable discrimination and calibration of those models, the low clinical utility, especially positive predictive values (PPVs) around 0.2 at a 20~60% risk level of CPSP, limits their application in clinical practice, making it still difficult to identify high-risk patients early (9)(10)(11)(12). Moreover, considering the different genetic, cultural and social backgrounds between different ethnic groups, which may also play an important role in this complex pathophysiological disease status, the extrapolation of those tools in Asian patients may be potentially limited.
Machine learning is a form of artificial intelligence that uses computer algorithms to identify nonlinear data patterns within large datasets to formulate outcome prediction and indicate improved prediction performance compared to the traditional prediction methods, which may be more suitable for the prediction of CPSP (13, 14). For example, random forest (RF) designs meta estimators that fit a number of decision tree classifiers on various sub-samples of the dataset and uses average to improve the predictive accuracy and control over-fitting. Gradient boosting decision tree (GBDT) gives a prediction model in the form of an ensemble of weak prediction models, which has strong generalization ability and performed well in both classification and regression tasks. Extreme gradient boosting (XGBoost) is an algorithm built on the GBDT framework and processed the missing data efficiently and flexibly (15). These three algorithms were also deep learning algorithms, the architecture of machine learning designed to mimic the neurological structure of the human brain, which might be more powerful than traditional algorithms in data analysis and prediction (16). Therefore, in this study, we intend to select these three algorithms with good fitting ability to develop prediction models for CPSP in Asian patients with breast cancer, and compare these models with logistic regression, hoping to improve the performance and clinical utility of the prediction models.
Contributions of this study: 1 This time we use machine learning approaches to develop prediction models, intending to improve the clinical utility and help clinicians identify high-risk CPSP patients more accurately and confidently.
2 Considering the different genetic, cultural and social backgrounds between different ethnic groups, which may also play an important role in CPSP, we focused on Chinese patients to explore predictors and models more suitable for Chinese population.

Study population
The study was a second analysis based on the dataset from the Chinese center in a multicenter randomized controlled trial (RCT, NCT00418457), held from 2014 to 2016, which has previously been described in detail (17)(18)(19). We enrolled women younger than 85 years with primary breast cancer without known extension beyond the breast and axillary nodes (ie, believed to be tumor stage 1-3, nodes 0-2) who were scheduled either for unilateral or bilateral mastectomy, with or without implants, or for wide local excision with node dissection. We excluded women who had previous surgery for breast cancer (we allowed diagnostic biopsies and guide-wire insertion), had inflammatory breast cancer, were scheduled for free-flap reconstruction, had American Society of Anesthesiologists (ASA) physical status of IV or higher, had contraindications to either anesthetic approach, or had other cancer not in long-term remission.
All the surgeries were conducted by the same surgical team, and the perioperative analgesia was standardized. In the original trial, patients were randomly assigned to either opioid analgesia (under fentanyl with sevoflurane general anesthesia, GA group) or paravertebral blocks (under paravertebral block with propofol general anesthesia, PPA group). Tramadol was the first-line postoperative analgesic in both study groups. Analgesia at home during the first postoperative week consisted of ibuprofen, acetaminophen, or a combination of acetaminophen and codeine.

Outcomes
The primary outcome was CPSP at 12 months after breast cancer surgery. According to the 2016 International Association for the Study of Pain (IASP) criteria (20), CPSP is defined as pain that occurs after surgical intervention and lasts for at least three months, excluding other potential causes (e.g. cancer recurrence and infection). In our study, patients with modified Brief Pain Inventory (mBPI) > 0 at 12 months after breast cancer surgery in surgical area (breast, axilla, and arm) were considered to develop CPSP (21).
Outcomes of breast cancer were also recorded within 12 months, including 1) recurrence of breast cancer in the ipsilateral breast, thoracic wall, and axillary tissue with pathological confirmation; 2) distant metastasis, including the occurrence of breast cancer in the contralateral breast or any other remote organs with pathological confirmation, or multiple lesions consistent with metastases found on imaging examination; and 3) death from any reasons.

Data acquisition
Baseline characteristics including demographics, medical insurance level, preoperative data, surgical data, pathology data, and adjuvant therapies after surgery were acquired from the dataset of the RCT (NCT00418457). In addition, pain-related data including the presence of persistent pain of any kind, preoperative pain in the operative area (breast, axilla, and arm), opioid (fentanyl) consumption during surgery, postoperative pain intensity ratings within 2, 24, and 48 hours after surgery were also recorded in the dataset. Postoperative pain intensity was categorized according to verbal numerical rating scale (NRS) from no pain (NRS 0), mild pain (NRS 1~3), to moderate-to-severe pain (NRS 4~10).
Outcome observation and follow-up information were also obtained from the dataset of the RCT. All follow-ups were done using one qualified investigator unaware of the patient's random assignments and intraoperative management, and they have tried to contact not only the patients, but also their families and caregivers, and tried at least 3 times at each follow-up time point (30 days, 3 months, 6 months, and 12 months after surgery) to ensure a high proportion of successful follow-up (> 99%). Patients who withdrew from the RCT, lost to follow-up, or had missing data were excluded from our study.

Statistical analysis
The full dataset was randomly split into a training dataset (90%) and a testing dataset (10%). Feature selection and modeldevelopment were performed in the training dataset, and the validation and evaluation of the models were performed in the testing dataset. Continuous variables were converted to restricted cubic splines for better fitting (22).
Recursive feature elimination (RFE) (23) is a feature selection method that fits a model and removes the weakest features until the specified number of features is reached. Features are ranked by the model's coefficient or feature importance attributes and attempted to eliminate dependencies and collinearity that may exist in the model by recursively eliminating a small number of features per loop. In this study we used RFE for variables selection and pre-set 5 as the minimum number of variables. And then two clinical experts identified the final variables incorporated into the prediction model based on the results of RFE, clinical experience and risk factors mentioned in most studies (11,12,24,25). In addition, we analyzed the contribution (gain) of each identified variable to 12-month CPSP.
In the model-development phase, four prediction models including conventional logistic regression, RF, GBDT and XGBoost algorithm models, were constructed with the identified variables for CPSP prediction, the overview of these four models are shown in Table 1. Ten folds Grid-search cross-validation (26) was used to select the best tuning parameters for the RF, GBDT, and XGBoost models. The performances of these four models were validated and evaluated using the area under the receiver operating characteristic curve (AU-ROC) for discrimination and the integrated calibration index (ICI), E50, E90 and Hosmer-Lemeshow test for calibration (27). The diagnostic accuracy, including sensitivity, specificity, positive and negative likelihood ratios (PLR and NLR), and positive and negative predictive values (PPV and NPV), was also calculated to compare the clinical validity of different models.
All statistical analyses were performed by R 4.0.2 and python 3.8.0. A 2-sided P-value less than 0.05 was considered the threshold for statistical significance.

Baseline characteristics
A total of 1152 patients were finally included in the study, excluding 8 lost to follow-up, 4 withdrew from the RCT, 18 having missing data, 2 died within 12 months, and 69 with postoperative recurrence within 12 months. The process of patient selection is shown in Figure 1. 255 (22.1%) patients developed CPSP at 12 months after breast cancer surgery. According to the results of univariable analyses in the full dataset using logistic regression, in those patients with 12-month CPSP, the age, medical insurance level, menstruation status, whether receiving axillary lymph node dissection (ALND), surgical technique, and acute postoperative pain (within 2 days after surgery) differed significantly from those in patients without CPSP. Data is shown in Table 2.

Prediction models Overview
Logistic regression An extension of the linear regression model for classification problems, used to examine the association of categorical or continuous independent variable(s) with one binary outcome, more suitable for linear data pattern.

RF
An ensemble machine learning method for both classification and regression tasks, training a large number of individual decision trees on various subsamples of the dataset, and using average to improve the predictive accuracy and control over-fitting, more suitable for nonlinear data pattern.
GBDT An iterative decision tree algorithm with strong generalization ability, which combines several weak prediction models (predictors with poor accuracy) into a strong learner (a model with strong accuracy), performed well in both classification and regression tasks but with over-fitting.

XGBoost
Built on the GBDT framework and designed for supervised learning tasks such as regression, classification and ranking, which is a more regularized model formalization to control over-fitting.

Features selected in models
RFE was used for feature selection. As shown in Figure 2A, according to the results of RFE, better discrimination appeared when 12 or 6 variables remained (AUC was 0.720 and 0.735 respectively). Considering the clinical practicability of models, we hoped to select as few predictors as possible when discrimination is similar, so we selected 6 variables to be included, which were medical insurance level, menstruation status, history of any operation to anybody region, anesthetic technique, pathology stage for nodes (stage N), and NRS 2 days after surgery. According to the clinical experience and previous research (3, 5, 25), ALND would be more suitable as one of the selected predictors, replacing the stage N, since ALND may potentially cause nerve damage and is much preferred by breast cancer patients in China for the fear of tumor recurrence, even in patients whose stage N is 0 or 1. In summary, urban medical insurance, including urban employee basic medical insurance (UEBMI), urban resident medical insurance program for self-employed and unemployed urban residents (URMI), and commercial medical insurance (CMI), as well as post-menopausal status, history of at least one operation, under fentanyl with sevoflurane general anesthesia (GA group), receiving ALND and higher NRS within 2 days after surgery were associated with a higher risk of 12-month CPSP after breast cancer surgery.
Furthermore, we analyzed the contribution (gain) of each feature selected above to 12-month CPSP. The importance of each feature listed in the descending order was NRS within 2 days after surgery > menstruation status > medical insurance level > history of operation > anesthetic technique > whether receiving ALND. Details are shown in Figure 2B.

Models development and comparison
We incorporated the 6 predictors mentioned above to construct different prediction models using training dataset, including RF, GBDT and XGBoost algorithm models, and multivariable logistic regression model. The performances of these four models were then evaluated and compared.
The results indicated that machine learning models (RF/GBDT/ XGBoost) had better discriminatory power compared with the multivariable logistic regression model [  Results presented as x ± s or n (%). ALND, axillary lymph node dissection; ASA, American society of anesthesiology; BMI, body mass index; CI, confidence interval; CMI, commercial medical insurance; CPSP, chronic postsurgical pain; GA, fentanyl with sevoflurane general anesthesia; NCMS, the new cooperative medical scheme for rural residents; NRS, numerical rating scale; OR, odds ratio; PPA, paravertebral block with propofol general anesthesia; UEBMI, the urban employee basic medical insurance; URMI, the urban resident medical insurance program for self-employed and unemployed urban residents. * Previous history of operation refers to any type of operation to any body region.
According to the general incidence of CPSP in breast cancer population, we evaluated the clinical validity of all the four prediction models at a 20% risk level of CPSP. As shown in

Discussion
Breast cancer is considered to be the most prevalent cancer in women. As survival improves primarily due to earlier detection and improvements in the therapeutic approaches (28) study developed CPSP at 12 months after breast cancer surgery, which could reduce their quality of life (3-5). At present, more and more prediction models are established to identify the high-risk CPSP patients in advance (9-12). Sipilä R et al. created a 6-factor risk index to predict persistent pain at 6 months after surgery using Bayesian model in Finland population prospectively (9). Another 4-item preoperative risk score for persistent pain at 4 months after surgery was developed with multivariable logistic regression model in Switzerland (10). Meretoja TJ et al. created a web-based risk calculator using logistic regression analyses to assess the risk of persistent pain at 1 year after surgery in European breast cancer cohorts (11). However, most of these models are conducted in European patients, with a lack of studies on Asian patients, and the low clinical utility limits their application in clinical practice. Machine learning is a form of artificial intelligence that uses computer algorithms to identify nonlinear data patterns within large datasets to formulate outcome prediction (13), which may be more suitable for the prediction for CPSP. Considering the poor performance of machine learning algorithms chosen in previous studies, this time we selected three representative machine learning methods with good fitting ability, including RF, GBDT, and XGBoost to develop prediction models, based on a high-quality dataset and rigorous model evaluation, intending to improve their clinical utility.
Our results suggested that, although the superiority of machine learning models over logistic regression model in discrimination and calibration are not particularly prominent, they performed better in specificity, PLR, and PPV. We found that posterior positive probability, reflected by PPV, was around 0.8 using machine learning methods, which reflects a large increase from the prior probability of 0.2 (22.1%). That means the probability of developing CPSP is as high as 80% if the patient was categorized as high risk using our prediction model. This indeed could improve the models' clinical utility and help clinicians identify high-risk CPSP patients more accurately and confidently, so that appropriate treatments could be taken promptly. Considering the high quality and low lost to follow-up rate (<1%) of our dataset, these results were reliable and convincing.
According to the feature selection, acute postoperative pain which reflected by a higher NRS within 2 days after surgery, is one of the risk factors for CPSP, consistent with previous research (29). It has been reported that the paravertebral block could improve early postoperative analgesia and prevent CPSP by impacting the transition from acute to chronic pain (7,8,30). This may explain why the patients under paravertebral block with propofol general anesthesia (PPA group) had a lower risk of CPSP, compared with those who under fentanyl with sevoflurane general anesthesia (GA group).
Menstruation status was another important predictor for CPSP after breast cancer surgery. It was found in our results that patients under post-menopausal stage seemed to experience a higher risk of CPSP. The decline of estrogen and various musculoskeletal and climacteric symptoms may be the possible explanations (31). However, some reports suggested a younger age was associated with greater risk of CPSP (25). This might be related to the social background difference in different ethnic groups. Post-menopausal women are generally over 55 years old and tend to be retired in Each cell contains the appropriate calibration metric and its 95% confidence interval. E50 and E90, the median and 90 th percentile of the absolute difference between observed and predicted probabilities respectively; GBDT, gradient boosting decision tree; H-L, Hosmer-Lemeshow test; ICI, integrated calibration index (a measure of calibration, which could be interpreted as weighted difference between observed and predicted probabilities); RF, random forest; XGBoost, extreme gradient boosting. China. In most cases, their children, now adults, have left home for work or study, making them experience low hormone levels and the loneliness due to unaccompanied simultaneously. These might lead to a more anxious mental status which could cause or aggravate the feel of pain (32). Recently Wang Y, et al. also constructed prediction models of chronic pain after breast cancer surgery using a variety of machine learning techniques (33). However, these models not only have lower discriminatory power, sensitivity and PPV, but also do not include psychosocial factors. Pain is a strongly subjective feeling that is affected by the disease itself and relevant clinical factors, as well as social factors, cultural backgrounds, psychological state, and so on. Therefore, in this study, we also took medical insurance level into consideration to represent patients' social backgrounds and reflect their psychosocial status. According to the current forms of medical insurance in China, we divided patients' medical insurance level into no medical insurance, rural medical insurance, which is the new cooperative medical scheme (NCMS) for rural residents, and urban medical insurance, which includes UEBMI, URMI and CMI (34). Our results demonstrated that patients with urban medical insurance suffered a higher risk of CPSP. Urban medical insurance could represent the urban residents. They usually have higher economic and educational level and pay more attention about their physical health, life quality, psychological condition and self-feeling, therefore more prone to exacerbate the feeling of pain. Conversely, patients living in rural areas, often with NCMS, may suffer from a heavier burden of life, have a lower level of education and self-focus, and thus seldom care about inconspicuous pain.
Our study also had several limitations. First, it was a secondary analysis based on the dataset in one single center of a multicenter RCT. Although the data in this study were accurate and complete with low drop-out rate, relatively strict inclusion and exclusion criteria in the original trial limited the external validity of our results. Second, despite the machine learning models showed better specificity and PPV, their sensitivity and NPV were not good. Further studies were urgently needed to improve models' general performance. Third, our prediction model included some specific cultural differences such as insurance in China, which might make the global applicability of the model limited. Finally, psychosocial factors may play an important role in CPSP after breast cancer surgery. Although some psychosocial factors were included in our analyses and model development, they are not enough, especially lacking of data on underlying anxiety. Studies incorporating a more comprehensive analysis of psychosocial factors should be conducted in the future.

Conclusions
This study developed prediction models for CPSP at 12 months after breast cancer surgery based on machine learning approaches, which could assist clinicians to identify high-risk patients more accurately and conduct clinical interventions in advance. Using machine learning methods could be a novel approach to predict CPSP and help tailoring precise management for patients with breast cancer, leading to a better prognosis and an increased quality of life.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Institutional Review Board of our hospital (Number: S-638). The patients/ participants provided their written informed consent to participate in the study.

Author contributions
CS: data acquisition and analysis, manuscript preparation. ML: data acquisition and analysis, tables and figures preparation. LL: data acquisition and analysis, manuscript preparation. LP: study design, data acquisition and analysis, manuscript revision and guarantor. YZ: data analysis and manuscript revision. GT: data acquisition. ZZ: data acquisition. YH: study design and manuscript revision. CS, ML, and LL contributed equally to this study. All authors contributed to the article and approved the submitted version.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.