Machine learning based prediction models for analyzing risk factors in patients with acute abdominal pain: a retrospective study

Background Acute abdominal pain (AAP) is a common symptom presented in the emergency department (ED), and it is crucial to have objective and accurate triage. This study aims to develop a machine learning-based prediction model for AAP triage. The goal is to identify triage indicators for critically ill patients and ensure the prompt availability of diagnostic and treatment resources. Methods In this study, we conducted a retrospective analysis of the medical records of patients admitted to the ED of Wuhan Puren Hospital with acute abdominal pain in 2019. To identify high-risk factors, univariate and multivariate logistic regression analyses were used with thirty-one predictor variables. Evaluation of eight machine learning triage prediction models was conducted using both test and validation cohorts to optimize the AAP triage prediction model. Results Eleven clinical indicators with statistical significance (p < 0.05) were identified, and they were found to be associated with the severity of acute abdominal pain. Among the eight machine learning models constructed from the training and test cohorts, the model based on the artificial neural network (ANN) demonstrated the best performance, achieving an accuracy of 0.9792 and an area under the curve (AUC) of 0.9972. Further optimization results indicate that the AUC value of the ANN model could reach 0.9832 by incorporating only seven variables: history of diabetes, history of stroke, pulse, blood pressure, pale appearance, bowel sounds, and location of the pain. Conclusion The ANN model is the most effective in predicting the triage of AAP. Furthermore, when only seven variables are considered, including history of diabetes, etc., the model still shows good predictive performance. This is helpful for the rapid clinical triage of AAP patients and the allocation of medical resources.


Introduction
AAP is a condition that occurs within the abdomen and has a sudden onset, typically lasting less than a week (1).Patients with AAP are one of the major groups in the ED, accounting for 5-10% of all visits (2)(3)(4).AAP can have multiple causes, including gastrointestinal disorders, thoracic cardiovascular disease, and neurological disorders.They also vary in complexity and risk and often involve the clinical care needs of various specialties (including gynecology, pediatrics, internal medicine, surgery, etc.), which may require interdisciplinary and multidisciplinary collaboration, particularly in emergencies such as acute appendicitis, ruptured abdominal aortic aneurysm, and ectopic pregnancy (2,5,6).However, in the complex environment of the ED, where most patients with AAP claim to be in urgent need of pain management measures (7, 8), misdiagnosis or other inappropriate management measures can have catastrophic consequences, as well as lead to a range of legal disputes (9)(10)(11).This undoubtedly poses a greater challenge to the triage work of healthcare professionals in the ED.Therefore, there is a need to further clarify the risk characteristics of patients with AAP in terms of the severity of their condition.Additionally, there is a need to enhance the ability of healthcare professionals to differentiate the causes of AAP in the ED.
With the utilization of various advanced technologies in clinical medicine, such as computed tomography (CT), ultrasound, and magnetic resonance imaging (MRI), medical imaging plays a crucial role in delivering accurate clinical diagnoses and efficient care for patients with AAP (12,13).However, non-essential diagnostics inevitably increase the cost of care for patients and the burden on hospital systems.Additionally, there are potential risk factors, such as allergies to contrast media and radiation exposure, that cannot be ignored, particularly in areas with limited healthcare resources (14)(15)(16).According to Trentzsch et al., it is crucial to quickly determine the underlying cause in AAP and assess whether urgent or immediate surgical intervention is needed (17).Therefore, there is an urgent need for an efficient triage tool that does not rely solely on radiological techniques but can accurately assess the criticality of AAP based on the physician's initial assessment and laboratory results.The development of machine learning and artificial intelligence has made this possible.Applying machine learning to emergency triage not only helps improve the accuracy of triage but also reduces the workload of medical staff (18)(19)(20)(21).Some studies have attempted to apply machine learning or artificial intelligence to emergency triage of patients with AAP (22)(23)(24).However, the clinical indicators included in different studies are often limited by clinical practice experience, and there are also differences in disease assessment standards and hospital preferences for patients seeking treatment.Therefore, the generalization ability of these models in various populations and medical institutions is limited (25,26).In China, due to the relatively recent implementation of the emergency pre-check triage system, there are limited studies on emergency triage related to AAP, and there is currently no standardized approach.Therefore, to accurately triage patients with AAP and select appropriate treatment strategies, it is crucial to establish a triage prediction model for patients with AAP in our hospital.
In this study, we aimed to construct and optimize a prediction model for AAP triage using a machine learning approach, which is not dependent on imaging diagnosis.To clarify the risky clinical features of AAP patients in terms of the degree of criticality of their condition, so as to achieve accurate triage of patients with AAP.

Study design
This retrospective study was conducted in accordance with the Declaration of Helsinki regarding the Ethical Principles for Medical Research Involving Human Subjects (27).The study was approved by the Ethical Committee of Wuhan Puren Hospital (MR-42-24-001914), but informed consent was waived due to the study's retrospective nature.

Data source
The case information was obtained from the management information system of Wuhan Puren Hospital.The data included the medical records of patients who visited the ED with AAP as their primary complaint between January 1, 2019, and December 31, 2019.A total of 4,323 cases were screened.Epidata version 3.2 was used for data entry.Inclusion criteria: (1) Age ≥ 14 years old; (2) The primary symptom is acute abdominal pain; (3) Complete diagnostic and treatment records with clear diagnosis.Exclusion criteria: (1) Age <14 years old; (2) Patients who did not continue treatment at this hospital and have an unclear diagnosis; (3) Patients who have canceled their appointment; (4) Gynecological acute abdomen, including pelvic inflammatory disease, pelvic mass, torsion of the adnexa, ectopic pregnancy, etc.; (5) Follow-up patients; (6) Patients with more than half of the characteristic variables missing.Based on the nadir criteria, 1911 patient records that did not meet the requirements were removed, and 2,412 patient records that met the criteria were retained for use in this study.

Key feature variables and outcome indicators
The following alternative characteristic variables were identified according to Hastings and Yew et al. (28,29) • Patient demographic characteristics: gender, age, profession; • Past medical history: hypertension, diabetes, coronary heart disease, stroke, history of abdominal diseases; • Medical visit details: method of visit, time of visit, body temperature, pulse, respiration, blood pressure, blood oxygen saturation; • General symptoms and signs of the patient: pale appearance, facial appearance, mental status; • Abdominal characteristics of the patient: bowel sounds, triggering factors, location of the pain, nausea and vomiting, diarrhea and fever, hematemesis and melena, tenesmus, syncope and consciousness disorders, duration Frontiers in Medicine 02 frontiersin.orgSchematic diagram of the model structure and parameter settings.
of pain, quality of the pain, pain score, rebound tenderness, and abdominal muscle tension.
The outcome indicator was whether the triage level indicated a critical patient.Critical patients were defined as individuals who arrived at the hospital with a report of critical illness within the past 24 h, had a resuscitation record for the first 24 h after arrival, had a critical value recorded within 24 h, required emergency surgery, and ultimately died.If one of these conditions is met, it is coded as 1; otherwise, it is coded as 0.

Data analysis and model construction
Univariate analysis was conducted to screen the characteristic variables.The characteristic variables with a significance level of p < 0.05 were selected as independent variables, while the patient's critical status was used as the dependent variable.Binary logistic regression was used for multifactorial analysis to identify the risk factors associated with the severity of acute abdomen in patients.The characteristic variables identified through logistic regression were then used as input variables for the prediction model.Statistical analyses were performed using SPSS 26.0 software for Windows (SPSS Inc., Chicago, IL, USA).According to the nature of the outcome variable, it can be divided into binary, unordered multicategory, and ordered multicategory data.The chisquare test is used for binary and unordered multicategory data, while Fisher's exact test is employed when at least one expected frequency is less than five.The Kruskal-Wallis rank sum test is utilized for ordered multicategories.A p-value of less than 0.05 was considered statistically significant.
A sample database was established based on the statistically significant characteristic variables that were screened.The unbiased randomized sample allocation method was used to code the screened sample data based on the attribute division criteria.The training and testing cohorts were randomly and automatically assigned the data in a 4:1 ratio using Python.A total of eight machine learning models, including logistic regression, K-nearest neighbor, support vector machine, kernel function support vector machine, decision tree, random forest, extreme gradient boosting, and artificial neural network, were built.The evaluation metrics for assessing model performance include accuracy, F1-score, recall, and AUC value.

Characteristics of study subjects
A total of 2,412 cases were included in this study, of which 568 cases (23.5%) were classified as critically ill, and 1,844 cases (76.5%) were not classified as critically ill.There were 1,210 males (50.2%) and 1,202 females (49.8%).The age range of the patients Frontiers in Medicine 03 frontiersin.orgwas 15-95 years old.Among them, there were 850 patients aged 15-35 years old, accounting for 35.2% of the total.There were 1,009 patients aged 35-65 years old, accounting for 41.8%.Lastly, there were 553 patients aged 65-95 years old, accounting for 23.0% of the total.Among the patients with different occupations, students accounted for the smallest proportion of 324 cases (13.4%).Most of these cases were non-critical patients (295 cases).Employees made up the highest percentage of patients at 48.5%, with a total of 1,171 cases.Unemployed and retired individuals fell in the middle of the list, with a total of 917 cases.The results of the univariate analysis revealed significant differences between the two groups of critically ill patients and non-critically ill patients in terms of gender, age, and occupation (p < 0.05).However, no statistically significant differences were observed in the presence of diarrhea and fever (Table 1).

Multiple logistic regression analysis
The 31 characteristic variables that were found to be statistically significant were included as independent variables in the multifactorial logistic regression analysis.From this analysis, a total of 11 independent risk factors were identified as significantly associated with the dependent variable.These risk factors include the history of diabetes and stroke, pulse, blood pressure, pale appearance, bowel sounds, location of the pain, nausea and vomiting, vomited blood and black stools, quality of the pain, and rebound tenderness (Table 2).

Performance of prediction models
To predict critical patients with acute abdomen, we constructed eight models, which can be classified into two categories: (1) Traditional machine learning models: Logistic Regression (LR), K-nearest neighbors algorithm (KNN), Support Vector Machine (SVM), Kernel SVM, Decision Tree (DT), Random Forest (RF), and XGBoost; (2) Artificial Neural Network (ANN).We used 80% of the samples (n = 1929) as the training cohort and the remaining 20% (n = 483) as the test set.The remaining 20% of the samples (n = 483) serve as the test set.All eight models were first cross-validated on the training set using a 5-fold crossvalidation technique.This process was employed to optimize the hyperparameters of the models.The training set was divided into 5 equal parts.Each time, four parts of the data were used as the training cohort, while the predictive performance of the models was evaluated on the remaining part.In total, five training and testing sessions were conducted to determine the optimal hyperparameters for the model.The accuracy and AUC of the eight models were evaluated using five-fold cross-validation, as shown in Figure 2.Among the eight models, the Artificial Neural Network (ANN) model achieved the highest Area Under the Curve (AUC) value of 0.9877 ± 0.0056 and an average accuracy of 97.67% ± 0.48.Although slightly lower than RF and XGBoost, the ANN model performed exceptionally well.
We trained eight models using the entire training set and evaluated their predictive performance on the test set.The accuracy, AUC, recall, and F1 scores of the models on the training and test sets are shown in Table 3.Among the eight models, the Artificial Neural Network (ANN) achieved the highest Area Under the Curve (AUC) of 0.9972 on the test set (Figure 3).In addition, the Artificial Neural Network (ANN) achieved an accuracy of 97.92% and an F1 score of 0.9793, which were only slightly lower than the highest performing Decision Tree (DT) model.It is worth noting that the AUC of the ANN on the test set only slightly decreases compared to the AUC on the training set, but it is better than all other models.This indicates that the ANN model has better generalization ability compared to the other models, which is important for clinical applications.Overall, the ANN model has better predictive performance.

Optimization of ANN algorithm prediction model
To further optimize the ANN prediction model, we randomly included a single or a combination of multiple feature variables in the analysis.The results showed that when only a single feature variable was included, the AUC of the ANN model ranged from 0.540 to 0.873.However, when 2-3 variables were included simultaneously, the AUC ranged from 0.599 to 0.905.This suggests that relying solely on clinical feature data of less than 3 variables has significant limitations in AAP triage.In contrast, when we included 11 feature variables obtained from multifactorial logistic regression analyses simultaneously, the AUC reached 0.993.This value was nearly identical to the AUC when all feature variables were included (Table 4).To identify the main factor variables among this group of characteristic variables, various combinations of tests were conducted.The results showed that when including seven variables, history of diabetes and stroke, pulse, blood pressure, pale appearance, and location of the pain, the AUC could reach 0.983.In this situation, we can consider these seven characteristic variables to be significantly important for assessing the severity of AAP.

Discussion
Before the widespread availability of medical imaging, the traditional treatment approach depended on the expertise of doctors.They would form their opinions based on the patient's medical history and physical examination, as well as their own clinical experience (30, 31).In the event of a missed diagnosis or misdiagnosis, it could directly increase the mortality rate (32).To prevent misdiagnosis of critically ill patients with specific conditions, many of them undergo unnecessary surgery (33).Despite the current agreement on the use of CT and ultrasound in AAP, the complexity of AAP still presents a significant challenge for emergency physicians (29,(34)(35)(36)(37). Therefore, it is essential to develop a predictive triage system to stratify the risk of AAP patients as accurately as possible (38)(39)(40).
In traditional pretest triage, patients are assessed based on age, gender, vital signs, SPO2, consciousness, Glasgow score, blood glucose, and pain (41).However, they are not assessed based on secondary complaints, concomitant symptoms, and past medical history.The advancement of computer-assisted decisionmaking technology has enabled the assessment of disease risk  based solely on clinical information, facilitating rapid triage in the ED.Prediction models exist for common intra-abdominal diseases like appendicitis, pelvic inflammatory disease, and leftsided diverticulitis (42-45).However, there is currently no standardized model for assessing risk and triaging patients with acute abdominal pain (AAP).In this study, we retrospectively analyzed the potential critical risk profiles of patients taking AAP by screening individuals with a comprehensive medical history and clarifying their diagnoses.Our results showed that 11 clinical features, including history of diabetes and stroke, pulse, blood pressure, pale appearance, nausea and vomiting, vomiting of blood and black stools, bowel sounds, location and quality of the pain, and tenderness/rebound tenderness, were strongly correlated with the severity of acute abdomen.The artificial neural network (ANN) model was also effective in predicting the severity of acute abdomen when assessed by combining only seven variables:  history of diabetes mellitus, history of stroke, pulse at the time of consultation, blood pressure at the time of consultation, pale appearance, bowel sounds, and site of pain.The three variables of diabetes history, bowel sounds, and pain site were also utilized as key factors in Wang et al.'s risk stratification method for patients with acute appendicitis (46).It is worth mentioning that this method includes more clinical features and laboratory findings.
While the method provides an accurate score assessment, it may be somewhat limited in situations where healthcare professionals are initially faced with an urgent claim from an AAP patient.This includes challenges such as the timeliness of laboratory findings and the difficulty in determining the etiology of the disease in some AAP patients, even with laboratory tests (47,48).In addition, it is important to consider a history of stroke in patients with acute abdominal pain (AAP).These patients may be overlooked during triage because they do not exhibit typical clinical symptoms, despite presenting with abdominal pain.Gastrointestinal symptoms may also trigger central nervous system disorders.For example, Taichi et al. reported a case of acute cerebral infarction caused by colon cancer (49).The determination of the pain site variable aligns with the quadrant partitioning commonly used in most studies (50,51).Diagnostic imaging and algorithms based on it can facilitate a prompt diagnosis (52, 53).Nevertheless, we need to clarify that in certain special cases, we must consider the potential side effects of radiation exposure, particularly in pregnant patients with AAP (54).However, it is undeniable that laboratory testing and imaging have made an excellent contribution to the management of AAP (55).Sufficient clinical information not only facilitates the interpretation of laboratory results but also helps radiologists make accurate imaging diagnoses (56).The use of artificial intelligence (AI) in automated agricultural machinery (AAP) can be traced back to the 1970s when it was introduced by Gunn and applied to AAP diagnosis (57).Since then, AI has begun to play a crucial role in the healthcare system and has been consistently optimized.Brejnebøl et al. (58) demonstrated that AI algorithms based on CT scans have benefited the diagnosis of patients with acute appendicitis, albeit with low sensitivity (58).In a recent review, Lam et al. confirmed the significant role of AI in predicting acute appendicitis and emphasized the need for its development in terms of clinical usability (59).It is important to note that artificial intelligence (AI) relies on machine learning, with different algorithmic models producing varying effects.It requires large amounts of data for validation in multiple simulations, making it an exploratory process.For example, three prediction models were developed in a recent study by Henn et al. (60).The tree-based algorithmic model showed the best performance in AAP-assisted decision making.However, even with the incorporation of laboratory test results, its AUC for predicting surgery was only around 0.8.While our model's performance appears to be strong, it does not necessarily indicate the same level of applicability across different samples.This could be influenced by factors such as sample size, characteristics of the population included, and so on.In addition, it is worth mentioning that AI is not only widely used in the AAP, but also played a crucial role in the 2019 COVID-19 pandemic, helping governments and healthcare workers make timely and accurate judgments, greatly reducing the loss of life and property for people (61).
Although the artificial neural network (ANN) model in this study performs well in predicting APP triage, it still has the following limitations.Firstly, the conclusions drawn from this retrospective study are limited by the available data.Secondly, the time to danger for critically ill patients in this study was defined as 24 h.This definition may focus more on patients with rapidly deteriorating or extremely severe conditions, thereby ignoring those who are equally dangerous but relatively less urgent.It should be noted that due to the limited number of acute abdominal cases related to obstetrics and gynecology in our hospital, it is not representative and therefore excluded from this study.Finally, this was a single-center study, and no additional external validation trials have been conducted.

Conclusion
Pre-screening and triaging patients with acute abdominal pain is a major challenge in healthcare.In this study, we developed a machine learning algorithm to construct an AAP triage prediction model, with the artificial neural network (ANN) model demonstrating the best performance.The model can assist clinical staff in promptly and accurately identifying patients at high risk of acute abdomen.This enables them to take timely interventions to reduce the danger, relying solely on seven crucial risk factors, especially in situations with limited medical resources.Although an increasing number of studies have begun to focus on the application of AI in clinical diagnostic decision-making, rigorous scientific validation is necessary to assess its clinical usability.
The future emphasis should be on developing and validating joint analysis and prediction models based on multi-center big data.This will help advance the development and application of outcome prediction and treatment plan prediction models.

FIGURE 1
FIGURE 1 The training, validation, and testing of the model are run based on Python 3.8.8,sklearn 1.3.0,tensorflow-gpu 2.4.0,Keras 2.4.0, with an NVIDIA GeForce 1650 GPU.The structure and parameter settings of the ANN model can be found in Figure 1.

FIGURE 2 Accuracy
FIGURE 2Accuracy (A) and AUC (B) of eight prediction models on the training cohort with five-fold cross-validation.

FIGURE 3 Accuracy
FIGURE 3Accuracy (A) and AUC (B) of eight prediction models on the training and testing cohorts.

TABLE 1
Demographic characteristics of patients with acute abdomen.
a Chi-square test; b Fisher's exact test; c Kruskal-Wallis rank sum test.

TABLE 2
Multiple regression analysis for OR detection.

TABLE 3
Performance comparison of eight models in predicting AAP criticality.