Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling

For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.

For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.
Keywords: operating room utilization, procedure time, regression, prediction, anesthesia time, surgeon time, surgical time inTrODUcTiOn Operating rooms (ORs) are some of the most valuable hospital assets there are, generating a large part of hospital revenue. Revenue per OR hour varies per procedure, but is estimated to be between $1,000 and $2,000 on average, before subtracting the variable costs of personnel and supplies related to hospitalization (1). This makes efficient utilization of ORs paramount. Every minute wasted may  The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT; abbreviations are described in Table 1) per case. TPT consists of anesthesiacontrolled time (ACT, itself consisting of the induction and emergence phases) and surgeon-controlled time (SCT, being the duration of the actual operation, including patient positioning and draping). ACT is included because in Dutch academic hospitals, the induction and emergence phases always take place in the OR, making them relevant to OR utilization.
Predicted TPTs are used to plan up to a desired level of utilization of the OR complex. Sequencing patient cases based on predicted TPT can help minimize the probability of underutilization of the OR and cancelation of procedures. Previous research has shown that using a fixed ratio to calculate TPT from SCT as estimated prior to an operation [estimated surgeoncontrolled time (eSCT)] provides more accurate estimates than adding a fixed duration for ACT to eSCT to compute TPT (2). In this paper, we attempt to improve the accuracy of TPT predictions further by including patient and surgery characteristics relevant to TPT.

MaTerials anD MeThODs
We extracted data from a Dutch benchmarking database of all surgeries performed in all eight academic hospitals in The Netherlands from 2012 till 2016. Written informed consent from the patients was not required, because no individual patient data were included. The data contributed by two of these hospitals were excluded, because they only contained observed and subsequently recorded SCT instead of the initially estimated SCT. The other records also did not contain eSCT, but did describe estimated TPT. We used this to approximate eSCT by subtracting 20 min, which is the default time allocated to ACT in many Dutch hospitals. Unfortunately, it was not feasible to accurately discover the exact time attributed to anesthesia for each operation in each hospital. Subtracting 20 min gives us approximate eSCTs that are sufficient for testing the methods described in this paper.
Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation (identified by unique codes as registered by the hospitals), American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used (again identified by hospital supplied codes). Other database fields described observed TPT, anesthesia induction time, and anesthesia emergence time. Observed ACT was calculated by adding up induction and emergence durations. Only records describing elective surgery were included, because emergency surgery does not receive an estimated TPT/SCT. Data analysis and statistical calculations were performed in R version 3.3.1. Implausible or impossible data values, such as a 0 for observed TPT, were marked as missing data. As we suspected missing data in the database to have occurred completely at random, we omitted incomplete records from the analysis. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. The distribution of the characteristics within this dataset is shown in Tables 2 and 3. The data were split into a training set with records from the years 2012 till 2015 and a test set from 2016.  All variables of the linear regression models were highly significant predictors (p < 0.01), in part, due to the size of the dataset, except some of the levels of the factor variables for type of anesthesia and type of operation. These variables were retained in the model though, since the overall effect of the factor variables was significant. Ultimately, the best model was identified by examining when the adjusted R-squared showed only mini mal improvement after adding additional predictors.
Of all models tested, TPT is most accurately predicted using a linear regression model based on all available independent variables. However, as can be seen in Tables  An often used rule-of-thumb states the need for at least 10 records for each potential predictor of TPT to be included in the model. Recent research suggests the actual number may be even lower (3). Considering that the dataset used for our analysis contained nearly 80,000 records, we had ample precision to test all potential predictors and interactions.
First, we computed for each record the predicted TPT based on the fixed ratio model described by van Veen-Berkx et al. (2) For each patient, the eSCT was multiplied by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of ACT. Using both predicted and observed TPT, we computed the mean absolute error (MAE), the mean squared error (MSE), and model fit expressed as the adjusted R-squared of the model. The adjusted R-squared can be interpreted as the proportion of variance in TPT that can be explained by parameters in the model.
All linear regression models were created using the 2012-2015 data and then validated on both this set and the 2016 set. This enabled us to separately measure the performance of the models on new data and compare this to their performance on the training data.
We used the p-value of each variable and the adjusted R-squared values to test all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables.
As an additional alternative, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). This allowed us to compare our findings with various previous attempts to predict ACT (4,5).
Finally, to test for any possible influence, the omission of the incomplete records might have had on our results, we reran the analyses after imputation of the missing data. Linear regression was used to impute the numeric variables and a proportional odds model for the ordered variable describing ASA classification. The type of anesthesia used and the type of surgery performed could not be imputed, due to the large number of categories.  These main outcomes are summarized in Table 6. Figure 1 displays plots of the predicted versus the actual TPTs for these three models.
After imputation of missing data in the initial dataset instead of elimination of incomplete records, all results were practically the same.

DiscUssiOn
The improvement in TPT prediction of the best performing linear regression model versus the fixed ratio model was convincing. On the training data, the MSE was reduced by a quarter of the original value. This indicates that the variation in prediction errors was substantially reduced. As is to be expected, this effect was somewhat less pronounced on the 2016 testing data, but still very useful.
Making use of these more accurate predictions may help prevent the typical consequences of under-and overestimation. Underestimation can lead to costly overtime or even the cancelation of operations, while overestimation can lead to downtime of both the operating theater and its staff. For the hospital with the highest number of complete records in our dataset, totaling all the under-and overestimation of the included operations from 2016 results in a total overestimation of 3,118 h. Had they made use of a model as described in this paper (based on their own data), the total result would have been an overestimation of only 179 h. Depending on the way these hours would have been distributed in the scheduling, they may have led to additional operations being performed.
The accuracy of predicted durations of surgery also directly influences the confidence with which planners might increase the level of utilization of ORs. Planning for higher utilization is only possible with more certainty about case duration, but can offer significant financial and productivity related benefits.
A second important finding is that separate ACT prediction (using the same available variables but without eSCT) yields worse results than direct TPT prediction.
The fact that TPT is the result of ACT and SCT is demonstrated by the best performing model. This model is based on eSCT, type of operation, and the two most important anesthesiologic variables: ASA classification and type of anesthesia used. This means predictions are possible using a limited number of easily obtainable values. Even though our model is intended for use by a computer system, keeping the model simple by requiring fewer inputs improves its usability, understandability, and speed.
The fact that the regression models were calculated and tested using surgeons' actual pre-surgery estimations of SCT instead of recorded, historical SCTs lends additional credibility to our results. In actual planning practice, predictions will similarly need to be based on estimated SCT. Therefore, the performance of the models as described in our results should match real-world performance, as opposed to a likely positive bias when based on historical data. This is especially true for the performance on the 2016 data, which the model was not trained on. While performing our research, it became apparent that the predictions of the 2016 TPTs became increasingly accurate as our collection of training data grew. This suggests that the method described in this paper holds potential for improved performance when applied to even larger datasets, as are becoming increasingly available to health-care data analysts. Additionally, further improvement may be achieved by tailoring the analyses to local circumstances. It is possible to prepare custom models for the level of individual hospitals, departments, types of operations, or even surgeons.
Summarizing the above, we encourage hospital data analysts and surgical managers to create similar models to those described in this paper using as much of their own historical data as possible. The method described is relatively straightforward and might provide them with more accurate procedure time predictions than current practices.
A limitation of this study was that the data used were recorded in academic centers only. The applicability to typical OR schedules in regional hospitals has not been studied. In addition, we have averaged all suitable data available from these academic centers under the assumption that there were no major differences between these centers that might significantly alter the TPT.
The manual registration of the timestamps and semi-manual process of aggregating the other data has two important weaknesses. First, it most probably resulted in inaccuracies of the data, possibly leaning toward late recording of the key moments during the operations. Second, there was a surprising amount of missing data at analysis. Of the records we started with, only ca. 21% contained complete and plausible data in all required fields, making the rest unsuitable for analysis. The fact that the results after imputation of the missing data were very similar to those of our initial analyses indicates that eliminating the incomplete records had limited influence on the outcomes as described.
Both issues underline the importance of the implementation of automatic registration systems that integrate into the work processes in the OR to collect more and better data. Only then will the results of analysis of this data be taken to a higher level, allowing for robust conclusions with operational consequences.
A final important remark is that, despite the new model generally performing well over the long-term, a relatively high interindividual variability still exists. This could limit the usefulness of its predictions in day to day planning.
cOnclUsiOn A linear regression model to predict TPT based on eSCT, type of operation, patient ASA classification, and anesthesia type outperforms the current practices of using a standard duration for ACT or a fixed ratio between eSCT and TPT. A second conclusion is that predicting TPT through the separate prediction of ACT yields less accurate results than direct prediction of TPT.
aUThOr cOnTribUTiOns EE performed the analyses based on advice by SK and drafted the original manuscript. WB provided direct supervision during the entire project. AH, MK, and WB made valuable contributions to the anesthesiological aspects of the research performed and contributed to the article contents. GM independently performed the statistical analyses a second time to confirm the outcomes. He also provided additional advice and feedback on the methods and their textual descriptions.