Leveraging machine learning to develop a postoperative predictive model for postoperative urinary retention following lumbar spine surgery

Introduction Postoperative urinary retention (POUR) is the inability to urinate after a surgical procedure despite having a full bladder. It is a common complication following lumbar spine surgery which has been extensively linked to increased patient morbidity and hospital costs. This study hopes to development and validate a predictive model for POUR following lumbar spine surgery using patient demographics, surgical and anesthesia variables. Methods This is a retrospective observational cohort study of 903 patients who underwent lumbar spine surgery over the period of June 2017 to June 2019 in a tertiary academic medical center. Four hundred and nineteen variables were collected including patient demographics, ICD-10 codes, and intraoperative factors. Least absolute shrinkage and selection operation (LASSO) regression and logistic regression models were compared. A decision tree model was fitted to the optimal model to classify each patient’s risk of developing POUR as high, intermediate, or low risk. Predictive performance of POUR was assessed by area under the receiver operating characteristic curve (AUC-ROC). Results 903 patients were included with average age 60 ± 15 years, body mass index of 30.5 ± 6.4 kg/m2, 476 (53%) male, 785 (87%) white, 446 (49%) involving fusions, with average 2.1 ± 2.0 levels. The incidence of POUR was 235 (26%) with 63 (7%) requiring indwelling catheter placement. A decision tree was constructed with an accuracy of 87.8%. Conclusion We present a highly accurate and easy to implement decision tree model which predicts POUR following lumbar spine surgery using preoperative and intraoperative variables.


Introduction
Postoperative urinary retention (POUR) refers to a patient's inability to completely empty their distended bladder following surgery.POUR is a common complication across all surgical specialties with an incidence of 5%-70% (1,2).Following spine surgery, average rates of POUR range from 5 to 38% depending on the definition of POUR, study population, and surgical characteristics (1,(3)(4)(5)(6)(7)(8).The occurrence of POUR leads to discomfort and the potential need for catheterization, factors that overtly impact patient well-being.POUR has also been extensively linked to increased risk for serious complications such as urinary tract infection, sepsis, increased length of stay, higher medical costs, and increased rates of readmission to the hospital (4,5,(9)(10)(11).In addition to immediate patient well-being and comfort, POUR was found to lower patient satisfaction, with patients who experienced POUR being less likely to be satisfied with spine surgery even at long-term follow up (11).
Several patient specific risk factors have been associated with the development of POUR following lumbar spine surgery with age and male sex being the most frequently described factors (4,5,9,10,12).Likewise, numerous surgical factors such as operative time, number of operative levels, and fusion/surgical instrumentation have been associated with POUR (4,5,(11)(12)(13).While dozens of factors have been analyzed, few of these analyses have brought forth actionable plans for identifying patients at greatest risk for POUR outside of single variable analysis.These univariate approaches fail to adequately analyze the complex interactions of patient and surgical variables which limits their predictive accuracy.
Machine learning has become widely popularized in the spine surgery literature over the past decade with its application being put forward toward diagnosis of spinal conditions and prediction of surgical complications and outcomes (14).Previously, our group published a highly accurate model using preoperative variables to predict POUR through regression and neural network analysis (15); however, it did not account for intraoperative and perioperative variables during anesthesia, such as administration of narcotics, that have been demonstrated to affect a patient's likelihood to develop POUR (16)(17)(18).Herein, we present a machine learning comprehensive approach for identification and classification of patients at risk for POUR following lumbar spine surgery with patient, surgical and anesthesia variables.We hypothesize that the inclusion of a greater spectrum of variables will increase the fidelity of the predictive model.Practically, this would enable the surgical team to better identify patients at greatest risk for POUR, proactively adjust expectations, and arrange for proper monitoring and mitigating strategies.

Study design
We performed a retrospective review of consecutive patients who underwent spine surgery at our tertiary care academic medical center from June 2017 to June 2019.Patients were identified for inclusion in the database by query of CPT codes specific to lumbar spine operations: 22533,22534,22558,22585,22612,22614,22630,22633,22634,63005,63012,63017,63030,63035,63042,63047,63048,63056,63057.Patients were excluded if surgery was not done through the clinic setting, had surgery in a non-lumbar region (i.e., thoracic, or cervical level), or were <18 years old.Study design and data security methods were approved by our Institutional Review Board under protocol #201902403.

Identification of variables
The data were retrospectively collected from charted demographic information, nursing and anesthesia reports, and neurosurgical operative reports.Preoperative variables included age, body mass index (BMI), and pre-surgical use of opioids or urinary retention medication (i.e., 5-alpha reductase inhibitors and/or alpha inhibitors).International Classification of Diseases (ICD) codes preexisting the surgical visit were collected from electronic health record (EHR) as well as Epic's Care Everywhere ® feature, a network connecting UF Health's EHR to hundreds of other EHRs utilizing the Epic system (Epic Systems Corporation, Verona, Wisconsin).Intraoperative and post-operative variables were chosen based on previous studies and clinical suspicion of relevance (1, 9, 11-13, 15, 17-22).Intraoperative surgical variables included duration of surgery, indwelling catheter use, type of surgery (discectomy, laminectomy, and/ or fusion), type of fusion if relevant, pelvic screw placement, number of levels, use of minimally invasive techniques, and surgical approach.Intraoperative anesthesia variables included total intravenous fluid administration, total volume of blood products transfused, and all medications administered during the surgical procedure.

Definition of POUR
Patients were monitored in the neuroscience intensive care unit, post-anesthesia care unit, and neurosurgical floor unit for failure to void and distended or painful bladders.Indwelling urinary catheters were placed intraoperatively for cases with expected surgery duration exceeding 3 h.In the absence of indwelling catheters, urine volume was determined per standard of care with nurse-led bladder scanning.POUR was defined as the reinsertion of indwelling urinary catheter, or the need for straight catheterization for urine volumes exceeding 400 mL on bladder scan (23,24).Bladder scan was done with ultrasound in standard fashion.Timing of postoperative removal of the indwelling urinary catheter occurred at the discretion of the surgeon.

Statistical analysis 3.1 Variable selection
Four hundred and nineteen variables were collected including patient characteristics, ICD-10 codes, and intraoperative factors.Only patients with complete data sets were included in the analysis.To set up a model for predicting POUR, variables were selected in two steps.In the first selection stage, all variables were subjected to univariate analysis to reveal patterns of association with POUR.Mann-Whitney U-tests were used for continuous and nominal variables while chi-square tests were used for categorical variables.Following this analysis, variables were selected depending on statistical significance and refined based on previous literature (2,15).Then, a LASSO regression approach based on a penalized regression to obtain shrinkage estimators where only variables that did not shrink to 0 were kept.The data were randomly split into training (80%) and validation sets (20%).The training set was used to develop models to predict POUR.The validation set was used to evaluate the performance of the prediction models that fitted from the training data.

Predictive modeling
In building the predictive models, a logistic regression model is first fitted to predict POUR using the selected variables.The area under the curve (AUC) on both training and validation dataset was assessed to show the performance.Then, the predicted probability of having POUR for all patients from training and validation set is calculated from the logistic regression model.Based on the distribution of outcomes found in prior modeling based on pre-operative risk factors, we defined the top 11% of the predicted probability as high risk, the 74% as intermediate risk and the last 15% as low risk (15).Using the risk levels as outcome, a decision tree model is fitted to classify each patient's risk level in the training set.Five-fold cross validation is utilized for hyper parameter tuning on minimum split and maximum depth.The accuracy of the decision tree is calculated from the validation set for performance evaluation.Brier score (measure of the accuracy of the probalistic prediction) was used to compare the forecasting ability of each aspect of the model, where the lower the score, the better the predictions are calibrated (25).All statistical analyses were performed using SAS statistical software.

Surgical characteristics
The differences in rates of POUR based on surgical variables are shown in Figure 3.There were multiple significant surgical predictors of POUR.Rates of POUR were significantly higher in patients with surgeries involving fusion (+18.4%,p < 0.001) or laminectomy (+13.2%,p < 0.001).The rates of POUR in patients who underwent multilevel laminectomy (+22.1%,p < 0.001) and multilevel fusion (+24.1%,p < 0.001) were higher.Intraoperative indwelling urinary catheter placement (+20.1%,p < 0.001) was a strong predictor of POUR.Similarly, there was a significant difference in the likelihood to develop POUR in patients who underwent surgery involving posterolateral fusion (+18.8%,p < 0.001), pelvic screw placement (+15.9%,p = 0.014) or interbody fusion (+9%, p < 0.003).Conversely, rates of POUR were significantly lower in patients whose surgery

Anesthesia characteristics
A total of 69 variables were extracted and analyzed from intraoperative charts including muscle relaxants, reversal agents,  Bar graph of the differences in the rates of POUR based on categorical surgical variables for patients who underwent lumbar spine surgery.Frequency (n) and p-values comparing those who did and did not develop POUR.Asterisk (*) indicates p < 0.05 in chi-square test.Following initial univariate analysis of patient, surgical and anesthesia-related factors, 94 variables were selected for LASSO regression of which 13 variables did not shrink to 0. The LASSO regression model achieved an AUC of 0.676 on the testing set on the receiver operating characteristic (ROC) curve (training set AUC 0.743).The AUC on the precision recall curve (PRC) were 0.332 and 0.560 for the testing and training sets, respectively.After the model selection step, 14 variables including patient, surgical and anesthesia factors were isolated and included in logistic regression (Table 2).The logistic regression outperformed the LASSO regression model with an AUC-ROC of 0.737 (training set AUC 0.768; Figure 4).The AUC-PRC for this model on the testing and training sets were 0.614 and 0.402, respectively.After hyper-parametric tuning of selected predictors from the LASSO regression model, a decision tree model was constructed (Figure 5).The accuracy for the final decision tree model was confirmed to be 87.8% on a 3-class confusion matrix (which reduces to 70.9% on a confusion matrix excluding the intermediate category), with sensitivity 91.3%, specificity 55.2%, positive predictive value 61.0%, and negative predictive value 89.2%.Brier score was noted to be 0.19.

Discussion
POUR is an incompletely understood but frequently encountered barrier to patient recovery and satisfaction following lumbar spine surgery occurring in 25% of patients.Its pathogenesis is thought to be related to several factors including anesthetic agents, perioperative medications, and postoperative pain, all of which can alter the complex urinary signaling pathway.Anesthetics can act centrally at the pontine micturition center and peripherally as smooth muscle relaxants to decrease bladder contractility (26).Surgical pain or inadequate pain control further stimulates the sympathetic nervous system which acts to inhibit the detrusor muscle (27).Medications such as opioids are known to play dual functions by inhibiting parasympathetic and stimulating sympathetic innervations (28).Thus far, no highly reliable and easily available prediction tools have been developed to identify a priori who is at increased risk for its development.Here, we present a model leveraging machine learning to classify the risk of a patient developing POUR following lumbar spine surgery using patient, surgical and anesthesia characteristics.Using machine learning, we were able to condense more than 90 variables associated with POUR in univariate analysis to a 14-variable logistic regression model and eventually constructed an eleven-node decision tree after hyper-parametric tuning of selected predictors from the LASSO regression model, with a final accuracy for the decision tree model of 87.8% on a confusion matrix and AUC-ROC of 0.737.This accuracy outperforms all previously available models and hence offers a novel and improved predictive tool for POUR.
The incidence of POUR within our study was 26% and is well within the incidence of POUR (5.6%-38%) reported across diverse studies of lumbar spine surgery (3, 5-7, 9, 12, 13).Previous studies have contained extensive inclusion and exclusion criteria for their models of POUR.We chose to include a heterogeneous patient population within our analysis to better understand how we can comprehensively evaluate the lumbar spine surgery population for the  development of POUR.By utilizing the logistic and LASSO regression models, a decision tree was able to be constructed that outperforms any prior predictive tool with accuracy of 0.878.

Limitations and future aims
Our model has limitations.As with all algorithms, it is only as accurate as the data which it contains.In this case, it is derived from a large tertiary care referral center where comprehensive data about a patient's past medical and surgical history may not be complete.We minimized this variability by extracting the medical history of patients from Epic's Care Everywhere network (Epic Systems Corporation) which accesses patient's medical charts from hundreds of other healthcare organizations, not exclusively our hospitals electronic medical record.Likewise, the study was retrospectively designed which carries biases inherent to a retrospective study.
This study was aimed at prediction of POUR, and not at interpretation of component variables.It serves as a diagnostic tool for POUR instead of identifying the critical variables that cause it.It can be tempting to elaborate on the meaning of predictors featured in the final model; however, these specific predictors are likely confounded by extensive patient and surgical variables and would warrant further prospective investigation.For factors such as phenylephrine (used for intraoperative blood pressure augmentation), a feasible alternative that is not associated with POUR, regardless of causality between the factor and POUR, might not exist.However, the use of intraoperative urinary catheters which appears to be statistically significant in all models, presents a potentially modifiable variable.While this variable is extensively confounded by surgical time and associated anesthesia requirements via medications and fluids, it remains important to investigate.Additionally, further improvement in the predictive capabilities of this model can be achieved by including baseline bladder/urologic functional status and preoperative urologic medication requirements.

Conclusion
In conclusion, we describe a highly accurate postoperative predictive model for POUR following lumbar spine using diverse preoperative and operative (surgical and anesthesia) variables.We were able to leverage machine learning to develop a 14 variable logistic regression model with an ROC-AUC of 0.737 and a decision tree model with an accuracy of 87.8%.These models substantially outperform previously published models of POUR in this patient population and include a greater spectrum of variables to highlight the effect of many less frequently appreciated variables.Furthermore, the final decision tree model is easy to implement clinically and can be put forth toward further studies aimed at preventing POUR following lumbar spine surgery.A prospective, multi-center study is needed to further validate our prediction model.

FIGURE 2
FIGURE 2Bar graph of the differences in the rates of POUR based on preoperative clinical characteristics for patients who underwent lumbar spine surgery.Frequency (n) and p-values comparing those who did and did not develop POUR.Asterisk (*) indicates p < 0.05 in chi-square tests.BMI, Body mass index.

FIGURE 4
FIGURE 4 Receiver operating curves (ROC) and precision recall curves (PRC) for the logistic regression model.AUC, area under curve.(A) ROC-AUC for training set.(B) PRC-AUC for training set.(C) ROC-AUC for testing set.(D) PRC-AUC for testing set.

TABLE 1
Selected anesthesia variables found to have statistically significant differences between the group of patients that developed POUR and the group of patients that did not develop POUR.

TABLE 2
Multivariate logistic regression analysis for the development of the POUR model.
SE, Standard error of the coefficient; IV, Intravenous; RBC, Red Blood Cells.Bold values are statistically significant values defined as p-value < 0.05.