Evaluation of different machine learning algorithms for predicting the length of stay in the emergency departments: a single-centre study

Background Recently, crowding in emergency departments (EDs) has become a recognised critical factor impacting global public healthcare, resulting from both the rising supply/demand mismatch in medical services and the paucity of hospital beds available in inpatients units and EDs. The length of stay in the ED (ED-LOS) has been found to be a significant indicator of ED bottlenecks. The time a patient spends in the ED is quantified by measuring the ED-LOS, which can be influenced by inefficient care processes and results in increased mortality and health expenditure. Therefore, it is critical to understand the major factors influencing the ED-LOS through forecasting tools enabling early improvements. Methods The purpose of this work is to use a limited set of features impacting ED-LOS, both related to patient characteristics and to ED workflow, to predict it. Different factors were chosen (age, gender, triage level, time of admission, arrival mode) and analysed. Then, machine learning (ML) algorithms were employed to foresee ED-LOS. ML procedures were implemented taking into consideration a dataset of patients obtained from the ED database of the “San Giovanni di Dio e Ruggi d’Aragona” University Hospital (Salerno, Italy) from the period 2014–2019. Results For the years considered, 496,172 admissions were evaluated and 143,641 of them (28.9%) revealed a prolonged ED-LOS. Considering the complete data (48.1% female vs. 51.9% male), 51.7% patients with prolonged ED-LOS were male and 47.3% were female. Regarding the age groups, the patients that were most affected by prolonged ED-LOS were over 64 years. The evaluation metrics of Random Forest algorithm proved to be the best; indeed, it achieved the highest accuracy (74.8%), precision (72.8%), and recall (74.8%) in predicting ED-LOS. Conclusions Different variables, referring to patients’ personal and clinical attributes and to the ED process, have a direct impact on the value of ED-LOS. The suggested prediction model has encouraging results; thus, it may be applied to anticipate and manage ED-LOS, preventing crowding and optimising effectiveness and efficiency of the ED.


Introduction
Emergency departments (EDs) offer a fundamental service, providing emergency help 24 h a day, 365 days a year, therefore representing a critical public service role.The service is ensured by the presence of doctors assisting patients' injuries and illnesses, both acute and chronic.The ability and expectation of providing fast and decisive access for patients experiencing a medical emergency are among the evaluation criteria of an ED's performance (1).
There are two types of performance indicators: general indicators and specific ones.In particular, visit priority code and process time for assigned code are general indicators, while the relationship between length of hospital stay (LOS) and number of accesses for the code, the relationship between triage-visit time and number of accesses per code, the analysis of the flow of turnout in triage by time slots, and the application of the Pareto Rule for analysis of the performance of medical staff are specific indicators (2).Crowding in EDs has been recognised as a critical challenge in hospital administration (3,4).ED crowding has been linked to a variety of negative consequences, including increased costs for admissions, longer duration of stays and waits, and higher mortality rates (5)(6)(7)(8).Specifically, prolonged LOS is a significant evaluation metric for tracking ED crowding and its consequent saturation, which has been proved to be difficult to assess directly.As mentioned previously, LOS is identified as a marker of the quality of care in numerous healthcare settings and, as earlier research has shown, multiple aspects mainly associated with the patients' characteristics and the healthcare workflow relate to its modification (9,10).
The LOS in ED (ED-LOS) is defined as the interval from ED arrival to ED departure (11), meaning the period that passes between the registration of patients in the ED and their discharge from the ED, whether they are hospitalised in a medical ward, shifted to another centre, or discharged home (12).Having the capacity to recognise the variables that are significantly linked to high ED-LOS may be important since the health policy could use it to decrease the occurrence of this issue.Of course, a prolonged ED-LOS is a consequence of the crowding in EDs.
The LOS in healthcare can be influenced by a variety of circumstances and conditions.As a result, several methodologies have been proposed in the literature to study the factors that influence LOS in healthcare processes.Among them, regression models and artificial intelligence techniques have been widely applied with satisfactory performance to predict the LOS (13)(14)(15)(16)(17)(18)(19) and to address healthcare-related problems, such as elaboration and analysis of biomedical data and signals (20)(21)(22)(23)(24)(25)(26), development of clinical decision-making support systems (27,28), and quality assessment of medicine services.In fact, LOS has been already employed as a target output in healthcare, and other studies have recently aimed at predicting it in different fields (29,30).
Multiple studies show that ED-LOS is impacted by a multitude of complex variables, including hospital organisation and management, clinical staffing, bed occupancy, and triage procedure (31)(32)(33)(34).The clinical status of patients, as well as information such as sex, age, and comorbidity, are all linked to ED-LOS (35).
A flow of patients via ambulance service exacerbates the problem by bringing additional patients into a department that is already at or near capacity.In the meantime, the clinical condition of already triaged subjects in the waiting room might worsen within minutes from the first registration, resulting in bad outcomes (36).Extended LOS also leads to provider discontent and subsequent burnout (37).
Actually, no precise cut-off has been set up for defining an extended ED-LOS, with estimates varying from 3 to 12 h (38, 39).Several studies have defined an extended ED-LOS as a period spent by patients waiting in the ED of more than 3 h and other studies showed that complications can occur in patients diagnosed after 3 h from ED admission (40)(41)(42)(43)(44). Furthermore, longer ED-LOS has a damaging effect on key areas including patient experience with emergency medical care, the risk of side effects, and the percentage of patients who abandon the ED without being seen (45).
As a result, the ability to anticipate LOS and minimise ED congestion by recognising features that have an impact on prolonged LOS is critical to improving emergency quality-ofcare and assisting hospital managers in ED operational processes planning (42).
Machine learning (ML) techniques are frequently used in the health industry for supporting in diagnosis, forecasting patient outcomes, and allocating staff resources (46,47).Following this approach, many researchers have examined ways to determine ED-LOS, exploiting data processing, and particularly ML strategies (48)(49)(50).ML allows employing several predictors (also known as features) to build models that are useful to classify an output (usually categorical variables in the context of classification).Differently, when the output is a numeric variable, regression algorithms are required; nevertheless, there are different algorithms that can be used for both classification and regression tasks.
Based on the aforementioned considerations, the aim of this work, which is an extension and improvement of a prior work presented at a conference (51), is to build an advantageous model to forecast ED-LOS in advance by applying ML classification techniques.Predicting the ED-LOS in advance, based on a few variables, would be beneficial in this case to plan for future adjustments in financial and staffing allocation with the goal of minimising ED-LOS.

Context
In this work, we analysed a large dataset of patients extracted from the ED database of the University Hospital of Salerno, "San Giovanni di Dio e Ruggi d'Aragona."It is a University Hospital of national importance with the highest number of accesses in Campania region (an average of 95,000 accesses per year, with about 250 registrations per day at the ED, with only a part of them needing successive hospitalisation).
Table 1 shows the staff working at the ED and the availability of beds in the ED in this University Hospital, according to the admission code.
Moreover, there are 895 beds for acute patients.

Data collection
The collected data, analysed in this study, belong to the period between 2014 and 2019, with 496,172 admissions, to avoid the overlapping with the COVID-19 pandemic.
The dataset was prepared to make it conform with the ML algorithms processing.Specifically, in the analysis, we did not take into consideration all the situations in which the predictor variables or the ED-LOS were not present.This choice is justified by the fact that only 50 entries did not have all the features available and hence we deemed that the elimination of these 50 records would not alter the results of the model.The characteristics considered for each record of the dataset are as follows: • Gender: male/female (coded as: 0/1 for further processing).
• Access mode: divided into two classes: (1) autonomous, which considers patients reaching the ED by themselves; (2) via ambulance, including patients accessing the ED by ambulance.• Triage score: divided into five classes according to the colour assigned at the time of admission based on the severity of the patient's clinical condition, gradually increasing from white code (absence of severe symptoms) to green, yellow, red, and black code (death).• Time of admission: split into the following time windows: 0:00-6:00, 06:00-12:00, 12:00-18:00, 18:00-24:00.
Among the features available in the considered ED dataset, the selection was mainly driven by the knowledge of the specialists.Physicians who experienced prolonged ED-LOS helped us divide the features according to their effects on LOS.Factors considered as having low impacts on LOS, i.e., nationality, residence, triage doctor on duty, were eliminated.Moreover, these input features for prediction are the variables reported as factors influencing LOS in the analysed literature.The characteristics considered to have high impacts on LOS were evaluated for each record of the dataset and are provided in Table 2. Of note, the chosen variables, which will then be used for building models, are easily available in all healthcare facilities, which means that the proposed process of analysis could be easily implemented.
As requested by the health direction of the hospital, ED-LOS has been considered prolonged when it had a value greater than 3 h.The choice has been made also by considering distribution of the data, which did not allow us to consider a different threshold (which would have affected the creation of the classes).Moreover, in the literature, the most common thresholds are 2 and 4 h; therefore, a choice of 3 h could be acceptable (52,53).
The dataset was characterised by 143,641 occurrences (28.9%) of LOS with more than 3 h, and 352,531 (71.1%) of LOS less than 3 h.

Machine learning
ML techniques work by learning a function that translates input data to an output to make a prediction of its value.This is a generic learning activity that helps in making future predictions based on new given samples of the same input parameters.In this study, ML classification algorithms were implemented to forecast the ED-LOS.To handle the data, the Colab platform, a Cloud computing platform that supports Python as a computer language, has been used to develop a script that, starting from the input parameters, automatically predicts the future trend of ED-LOS.In our analysis, sex, age, arrival mode, triage score, and admission time slot were employed as input data for the classification algorithms.As an output, the ED-LOS was converted into a categorical variable.The total ED-LOS was dichotomised using a cut-off value of 3 h, indicating a prolonged stay, as requested by the health direction of the hospital and necessary to obtain a dataset distribution that was not too imbalanced.Since we had a dataset with labelled classes with the ED-LOS for each patient, supervised learning algorithms were exploited.Four distinct algorithms were used for the classification: Random Forest (RF), Neural Network based on a Multilayer Perceptron (MLP), Naïve Bayes (NB), and Logistic Regression (LR).RF is focused on a method that involves probabilities that an input belongs to a given class using a logistic sigmoid function to allocate the forecast result to a class based on whether the probability is near the class itself.LR is perfectly suited to our study because predictions are made according to the presence or absence of characteristics (normal or prolonged LOS) based on a set of predictor variables.The decision to use these ML techniques was primarily motivated by the willingness to use algorithms following different theoretical approaches and, second, to improve the efficiency of the operations of learning on the dataset through a tuning of their parameters.The classifiers were all taken from the Apache Software Foundation's MLlib library, which is an ML library.Because this is a single-centre study, an external validation has not be performed, but the performance of the algorithms was tested by using a ten-fold cross-validation to ensure that the accuracy value was more reliable than by using a hold-out validation.Moreover, a careful adjustment of the parameters of the classifiers was conducted depending on the individual properties of each, and the quality of the produced model was assessed.
When the four classifiers have outputted their predictions, a voter ensemble algorithm used their output to determine the majority class to be ascribed to the ED-LOS of the patient, meaning that, to get a greater performance, the voter employs an ensemble approach that relies on majority policy.Indeed, voting assigns to each record (i.e., patient) the value foreseen by at least three of the classifiers, resulting in a prediction matching to the option that obtains more than half of the votes.
The performance of the models was assessed by computing the following evaluation metrics: • Accuracy represents the ratio of correct predictions over the total.
• F-measure is the harmonic mean between precision and recall.
• Precision is the ratio between the amount of correct predictions of class over the total number of times the model predicts it.
• Recall is the ratio of correct predictions for a class over the total number of cases in which it actually occurs.
where TN is true negative, TP is true positive, FN is false negative, and FP is false positive.Finally, the 95% confidence interval was determined for each of the metrics using the Normal Approximation Interval method based on a test set (54).

Results
First, we present in a graphical way some characteristics of the patients enrolled in this study during the six years' time analysed; 496,172 patients met the eligibility criteria and 143,641 of them (28.9%)revealed an ED-LOS greater than 3 h.Considering the sample selected, 238,998 females (48.1%) and 257,174 males (51.9%) were registered at the ED.
Among the patients with a prolonged LOS, 47.3% were female and 51.7% were male; 71.0% reached ED autonomously and the remaining, by ambulance.A histogram in Figure 1A shows the distribution of the ages among patients with prolonged ED-LOS: Patients with higher ED-LOS belong mainly to the over 64 population.
Analysing then the time slots during which patients are admitted, it was determined that prolonged LOS are primarily present for the slot 06:00-18:00, with the ED being more frequented in these hours, as shown in Figure 1B.
Based on the assigned triage colour, it can be concluded that patients with the green code are the most numerous (Figure 1C) and, as it is reasonable to be, they have higher rates of LOS.
Second, the evaluation metrics of the four considered ML algorithms have been evaluated with the aim of investigating which best predicts the ED-LOS.Starting from the processing implemented through the Colab platform, four different performance measures (accuracy, F-measure, precision, and recall) have been calculated.The obtained values for the classification of ED-LOS are reported in Table 3.
The summarised results show that the accuracy reached by each classifier is comparable and the ensemble learning approach, which has been used to improve the performance, does not achieve a better score when compared with the other algorithms.Indeed, RF achieves the highest accuracy (74.88%), precision (72.85%), and recall (74.88%).

Discussion and conclusion
ED-LOS represents a crucial key indicator of the efficiency and appropriateness of healthcare services.The ability to understand the reasons of prolonged LOS in ED could reasonably support the detection of "bottlenecks" in their organisation.Indeed, the LOS is currently an important indicator for health facilities; other studies have already been focused on the use of such indicators as variables to be predicted for improving the efficiency of the hospital management (13)(14)(15)(16)(17)(18)(19).
Patient-related data, such as age, gender, admission time, and triage score, were included as predictors and analysed.The study was conducted involving a dataset of patients registered during the years 2014-2019 to the "San Giovanni di Dio e Ruggi d'Aragona" University Hospital's ED (Salerno, Italy).After collecting and analysing all data, we built a modelling approach to predict ED-LOS using the following ML algorithms: RF, MLP, NB, and LR.With the aim of developing a classification algorithm starting from sample or "training" data, the power of ML techniques was exploited.We extended the previous paper presented at a conference (51) by increasing the size of the dataset to train our prediction model, considering a longer timeframe and therefore a higher number of total entries.We improved the previous analysis considering ML algorithms that exploit different theoretical principles from each other, to evaluate their performance.Indeed, differently from other studies, the utilisation of a large volume of data offered us the possibility to train different sophisticated ML algorithms in an appropriate manner.The performance of the resulting prediction model is then evaluated to prove its capacity to enhance ED management.
The available data indicate that patients who exceed in ED-LOS belong to the elderly patients' group (over 64) and that major issues are found in the time window between 6 a.m. and 6 p.m., when ED crowding may occur.Moreover, the results of this study revealed that the highest proportion from the sample representing the 80% of ED accesses is triaged as green code, i.e., non-urgent cases.This may be due to the lack of accessibility of primary healthcare facilities, insufficient availability to outpatient treatments, a poor understanding of the function of the ED, and inadequate discharged follow-up planning.Starting from patient's features and by means of ML, it is possible to pre-emptively detect extended ED-LOS.The consistency of the sample size employed for training the ML classifiers as well as the good value of the performance measures are two main advantages of our prediction model.We gathered a total of 496,172 entries, detecting 143,641 patients with prolonged ED-LOS (greater than 3 h), even after cleaning the dataset by eliminating all the records pulled from the University Hospital's database with missing details.
Lastly, the ensemble learning approach has been used to improve the performance of the data processing.To evaluate the voter's performance, the total value of accuracy achieved is determined and compared to the other procedures.RF has a 74.8% accuracy rate and a recall of 72.8% which are the greatest among all the algorithms.The complete system design enables the creation of a greater prediction model with better accuracy levels than those achievable with separate categorisation methods.Predicting LOS in the ED through this approach, improvement activities might be implemented to lower its value.First, staffing might be modified depending on the number of patients in the ED to enhance the triage the patient assessment processes.Furthermore, the results obtained might be used to build new protocols for improving ED workflow and regulate the decisionmaking process for bed utilisation and patient placement based on their severity.It is not always fair performing a direct comparison with other ML-based studies since the variability between the design of different papers can influence the results.Nevertheless, there have been other researchers who tried to build predictive models to classify ED-LOS.
Turgeman et al. performed a regression analysis by applying a regression tree model for predicting the LOS, based on static inputs (i.e., values that are known at the time of admission and that do not change during patient's hospital stay) (14); they included several tens of predictors and obtained a coefficient of determination between 0.75 and 0.8.
Similarly, Naemi et al. performed a prediction of ED-LOS by using predictors available at the admission including pulse rate, arterial blood oxygen saturation, respiration rate, systolic blood pressure, triage category, arrival ICD-10 codes and gender; they performed both a regression with a final coefficient of determination of 0.33 and a classification with a final accuracy ranging from 66% to 82% (55).
Rahman et al. implemented a decision tree by including 33 attributed to identify patients at high risk of prolonged ED-LOS and reached an accuracy of 85% (56).
In summary, the results of this study can be exploited to develop a preventive plan to optimise the management of EDs by controlling ED-LOS, thus improving ED crowding and the consequent financial costs associated with it.It is undeniable that ED disorganisation causes congestion and delays, as well as influencing decisions of patients to leave the ED before seeing a doctor.This is likely because as the number of patients grows, i.e., during a specific time slot, healthcare professional staff become gradually insufficient and waiting times rise.By predicting prolonged ED-LOS, decision makers could give more attention to the need for supplementary medical, nurses, support staffing in specific work shifts that are well known to be critical.Even though the impacts of the considered parameters may not be universal, the technique might be applied in all the EDs for localised LOS study, with changes planned based on individual observations.Of course, this study has some limitations; among them, it should be noted that, despite the large number of available data, this is a single-centre study, and the output was made binary.Of note, to implement our workflow, we employed only a few features, which means that the algorithm has been able to achieve such results only by using as input age, gender, triage level, time of admission, and arrival mode.To further differentiate the algorithms' performance, it would be useful, as a future development, including other features that could be useful to further boost the evaluation metrics.Future improvements of this research might consider important root causes and the access to information, such as shortage of beds, staff shifts, delays in radiology and laboratory units, which would allow us to obtain a more powerful model.Because some of these organisational causes are external and not directly due to the ED process, measures implemented considering also these aspects might cover and improve the overall healthcare chain instead of the ED service only.
Moreover, making a comparison between datasets belonging to similar hospitals could be an interesting development aimed at determining with a more global vision the factors that most influence crowding and the increase of ED-LOS.
Another interesting development could be the implementation of this analysis process on data acquired after the COVID-19 pandemic, to compare the results before and after it.
Finally, we recognise that the number of features included in this study, to make the predictions, is limited.This is, of course, a limitation but, at the same time, it makes us consider that expanding them in future works may be useful to further improve the proposed models.

FIGURE 1 (
FIGURE 1 (A) Age group distribution for patients with prolonged ED-LOS.(B) Admission time slots for patients with prolonged ED-LOS.(C) Triage colour for patients with prolonged ED-LOS.

TABLE 1
Staff and beds available per triage code in the ED.bootstrapping to train numerous decision trees concurrently on different subgroups of the whole dataset and the given features.Next, using bagging, RF combines the outputs of the individual trees.RF has been chosen in our study because it is not sensitive to dataset noise, and it is not affected by overfitting; moreover, it works quickly and outperforms plenty of other tree-based methods.NB is a classification algorithm based on the principle of probability, specifically the NB theorem; it requires a robust constraint of feature independence although it consents to achieve good results in binary classification.Following the NB theoretical principles, the target of using this ML algorithm in our case study is to find what class of LOS has the maximum probability to occur based on the patient features.MLP is a classifier based on back propagation and refers to a multilayer feed-forward neural network mapping input data to output by estimating the weights associated with the network during training.Our analysis could have taken advantage of the MLP properties to be static, as for a given input they generate only one output set, and to be memoryless, because of the independence of the output from the previous network state.LR is a probabilistic clustering algorithm that is used to estimate

TABLE 3
Evaluation metrics, expressed in %, and confidence intervals of the ML algorithms.