- 1Department of Hepatobiliary Surgery, General Hospital of Ningxia Medical University, Yinchuan, China
- 2School of Clinical Medicine, General Hospital of Ningxia Medical University, Yinchuan, China
- 3Neonatal Intensive Care, General Hospital of Ningxia Medical University, Yinchuan, China
- 4Department of Thoracic and Cardiovascular Surgery, General Hospital of Ningxia Medical University, Yinchuan, China
- 5School of Basic Medical Sciences, Ningxia Medical University, Yinchuan, China
- 6Department of Pathogenic Biology and Medical Immunology, School of Basic Medical Sciences, Ningxia Medical University, Yinchuan, China
Background: Post-endoscopic retrograde cholangiopancreatography (ERCP) pancreatitis (PEP) is one most frequent and severe complication of ERCP. In consideration of recent advancements in both endoscopic and artificial intelligence research, it is possible to construct a practical risk prediction model to facilitate the identification of PEP patients at elevated risk.
Aim: We developed and validated a concise predictive model for post-ERCP pancreatitis risk with logistic regression (LR), LightGBM, Support Vector Machine (SVM), XGBoost, and Multilayer Perceptron (MLP) neural network models.
Methods: We selected 688 patients undergone ERCP to form the basic dataset, with 70% for training and 30% for validation. Subsequently, Stepwise Backward Selection Based on Logistic Regression was utilized to select pertinent clinical features, incorporating the machine learning (ML) models to construct the final predictive model. The efficacy of the model was evaluated by various metrics. These newly identified clinical features were then incorporated into a simplified, points-based risk scoring system for potential bedside application and further evaluation.
Results: Based on the collected data and the results of stepwise backward regression, we identified the following features as potentially significant clinical variables that influence the risk of post-ERCP pancreatitis: periampullary diverticulum, pancreatic stent placement, pancreatic guidewire passages, dilation of the extrahepatic bile duct, age, and coronary artery disease, and constructed a prediction model. Following this, several ML models were constructed to assess the performance of this model. All ML models demonstrated superior performance to conventional logistic regression (LR) models in terms of AUC curves, with XGBoost, SVM, LightGBM, and MLP models all achieving at least acceptable performance levels. Finally, we developed a simplified scoring system based on LightGBM model with an AUC of 0.75.
Conclusions: We developed and validated a concise predictive model for post-ERCP pancreatitis risk, and a simplified scoring system based on the LightGBM model. This model facilitates individual risk prediction and preventive strategy selection.
Introduction
Post-endoscopic retrograde cholangiopancreatography (ERCP) pancreatitis (PEP) is one most frequent and serious complication following ERCP, with an incidence rate from 2.1% to 15.1% (1). Although most PEP cases are mild or moderate, specific conditions can significantly extend hospitalization for patients and be fatal in severe instances (2). Based on these concerns, multitude of risk prediction models for PEP have been suggested, incorporating various patient and procedure-related factors such as gender, difficult cannulation, history of pancreatitis, pancreatic duct cannulation, etc. (3). However, due to the intricate interplay of risk factors that may even synergize, these models often suffer from limited discriminative power, complexity, or lack of external validation, making their limited application in clinical practice.
In recent years, machine learning (ML) has gained considerable attention in clinical medical settings. ML algorithms analyze a wealth of variables with complex relationships using methods such as supervised, unsupervised, and semi-supervised learning, offering advantages such as intuitiveness and high predictive efficiency (4). Research indicates that computer-aided diagnostic models substantially assist clinicians in diagnosing and predicting diseases (4). At present, many ML models and algorithms based on diverse architectures have been developed, showing impressive performance in predicting significant diseases in the medical field (5). Hence, our goal is to develop and validate practical PEP prediction models using the latest ERCP database from General Hospital of Ningxia Medical University. This study pursues two objectives: (1) to compare the performance of different ML models in predicting PEP with stringent inclusion and exclusion criteria; (2) to identify a clinically relevant model with as few predictive factors as possible through innovative approaches, aiding endoscopists in decision-making and planning postoperative management.
Materials and methods
Patient characteristics
We conducted a retrospective analysis of patients' clinical data from diagnostic or therapeutic ERCPs at General Hospital of Ningxia Medical University from May 2022 to June 2023. After applying rigorous inclusion and exclusion criteria, we included a total of 688 cases in the training and validation sets. Eligible patients were those with relevant biliary-pancreatic diseases indicated for ERCP, who had provided written informed consent for the procedure and had granted verbal or written permission for postoperative examinations. Patients were excluded if they presented with acute pancreatitis, had a history of previous ERCP or gastrointestinal reconstructive surgery. Patients that were under 18 years-old (6), had incomplete medical records or surgical videos, did not complete necessary follow-up examinations in a timely manner, did not undergo a full cannulation attempt and abandoned the surgery, or were treated by an endoscopist with less than 50 cannulate procedures (7), were also excluded (Figure 1).
Figure 1. The pipeline and criteria for ERCP patient selection and exclusion in this study (A) and the PCA plot for ADASYN (B).
Study endpoint and definitions of outcomes
The primary endpoint was established according to current international guidelines and consensus on the incidence of PEP. The criteria for the occurrence of PEP were defined as follows: (1) New onset of abdominal pain was consistent with pancreatitis (acute persistent upper left abdominal pain); (2) Serum amylase levels were three times greater than the upper limit of normal within 24 h after the procedure; (3) Imaging evidence of pancreatitis such as peripancreatic fluid extravasation, pancreatic gland enlargement due to edema, pancreatic duct dilation, pancreatic tissue necrosis, or formation of a pancreatic pseudocyst (two out of these three criteria must be met by the Atlanta Consensus 2012) (7, 8). Failed cannulation was defined as the inability to correctly enter the bile or pancreatic duct despite all techniques and efforts (9). Difficult cannulation was characterized by a cannulation time exceeding 5 min, more than five papillary contacts, or incorrect entry into the pancreatic duct more than once by ESGE Guidelines 2019 (2, 10). We further monitored preoperative serum calcium ion levels, classifying them using our hospital's upper limit of normal value at 2.12 mmol. In this study, the threshold of patient's age was set at <60. The biliary brush cytology sampling methods included cytological brushing or forceps biopsy (11). “Failure to clear bile duct stones” was determined as stones that were not retrieved or not completely retrieved following the full cannulation process. The patient retained a nasobiliary drain at the conclusion of the procedure (12).
Data processing
To ensure the stability of the results, patients with missing video recording data were excluded (n = 142), which did not affect the results of this study. As the incidence of patients with PEP at our institution was approximately 11%, there was an imbalance in the proportion of patients with and without PEP. To address the imbalance in sample sizes at different levels of the outcome variable, we used the Adaptive Synthetic Sampling (ADASYN) algorithm. This approach was used to survey patients with PEP and normal samples accordingly, achieving a 1:1 match with patients who did not develop PEP postoperatively (13).
Data separation and feature filtering
Prior to oversampling with ADASYN, we performed data separation by randomly dividing the dataset into two subsets (training and test) with a 7:3 ratio. These subsets were used for modelling and validation, respectively. To minimize the overfitting risk introduced by ADASYN oversampling, we implemented a robust validation strategy. Specifically, we employed 10-fold cross-validation during the model training and tuning phase. Furthermore, the final model was evaluated on a completely independent test set that was not involved in either the oversampling or cross-validation processes. Feature selection within the training set was conducted by a stepwise regression approach grounded in logistic regression with backward elimination (14). Covariance test showed no significant abnormality. This procedure was implemented using the Mass package in R software (version 4.2.2). Variables were selected according to the principle of minimising the Akaike Information Criterion (AIC) and a threshold of p-value < 0.01. This process aimed to identify a subset of features that provided an optimal balance between power and dimensionality (15).
ML model building, evaluation and interpretation
The dataset was subjected to both linear and non-linear ML models, including logistic regression (LR), support vector machines (SVM), gradient boosting trees (GBT) and neural network models. The SVM was implemented using both Linear and Radial Basis Function (RBF) kernels, the GBT was implemented using XGBoost (eXtreme Gradient Boosting) and LightGBM (Light Gradient Boosting Machine), while the neural network model was implemented using a Multi-Layer Perceptron (MLP). The predictive performance of all models was evaluated using five metrics: Area Under the Receiver Operating Characteristic Curve (AUROC), precision, recall, F1 score and accuracy. Finally, ML models with acceptable performance were selected based on the AUC curve.
Statistical analyses
Categorical variables were assigned numerical codes, and categorical variables were assigned numerical codes, and ization was conducted using EXCEL 2018. Python was used for data processing and model development, relying on the scikit-learn library for the construction of machine learning algorithms. Data statistics, variable selection and construction of conventional LR models were performed using R software (R4.2.2), with a p-value < 0.01 indicating statistical significance. All authors of this article had access to the research data, participated in data collection, reviewed and approved the final manuscript.
Results
Study population and baseline characteristics
A total of 1,748 patients who underwent ERCP procedures met the criteria for this study. Of these, 514 patients with pre-operative acute pancreatitis (hyperamylasemia) were excluded. Further 396 patients with previous ERCP or gastrointestinal reconstruction history were also excluded. A further 150 patients were excluded because the procedure was abandoned or performed by inexperienced endoscopist, the medical records or video data were incomplete. Each exclusion criterion was counted only once. Therefore, 688 patients were included in the final analysis. The study flow chart was shown in Figure 1. Table 1 showed the baseline characteristics of the study population, focusing on the variable PEP. The distribution of key characteristics, including sex, age, presence of diabetes, coronary heart disease, hypertension, history of biliary surgery, cholecystectomy, acute pancreatitis, Ca + and total bilirubin levels, and various clinical procedures and interventions, were reported. We then performed the PCA dimensionality reduction plot for ADASYN and found on difference between PEP (happen) and non-PEP (Not) patients (Figure 1B).
Feature filtering and prediction model construction
Based on the characteristic sets listed in Table 1, we used a stepwise regression method with logistic regression and backward elimination to select variables for feature filtering. Seven characteristic sets were included in the features with smaller AIC curves in the training set and a p-value < 0.01: Periampullary diverticulum (PAD), pancreatic stent placement (PSP), Pancreatic injection (PI), number of guidewire passages into the pancreatic duct (NGP), dilatation of extrahepatic bile ducts (DEBD), age, and coronary heart disease (CHD). Based on the logistic regression analysis results table (Tables 2–4), the conventional LR model predicts the probability of pancreatitis using the following equation:
To enhance the interpretability and clinical reliability of our model, we conducted a comprehensive Shapley additive explanations (SHAP) analysis and calibration analysis. For the five models, the SHAP summary figures intuitively showed the contribution of each feature to the model prediction, and highlights the most influential clinical variables (Figure 2). After that, we chose the intersection features selected by LASSO regression and stepwise methods for modeling, and finally removed the pancreatic injection (PI) feature from the equation. The final prediction equation was:
At the same time, we considered the CHD patients taking aspirin. Among 72 patients diagnosed with CHD, 50 patients (69.4%) regularly took aspirin before ERCP surgery, and only one patient (2.0%) had PEP incidence. For the 22 non-aspirin group, the PEP incidence rate was 18.2% (4/22), showing significant difference compared with aspirin-taken group (p-value = 0.024), indicating that aspirin effect on PEP incidence for CHD patients.
Model performance evaluation
The predictive performance that was measured by the receiver operating characteristic (ROC) model by comparing to the conventional logistic regression (LR) model, was depicted (Figures 3A–F). Models with an area under the ROC curve (AUC) exceeding 0.75 were considered as acceptable. This criterion was met by the following models: LR (Figure 3A) with a test AUC of 0.775 [95% confidence interval (CI) 0.727–0.823]; LR (ML) (Figure 3B) with a test AUC of 0.788 (95% CI 0.7347–0.8408); support vector machine (SVM) (Figure 3C) with a test AUC of 0.812 (95% CI 0.7612–0.8624); XGBoost (Figure 3D) with a test AUC of 0.840 (95% CI 0.7955–0.8851); LightGBM (Figure 3E) with a test AUC of 0.807 (95% CI 0.7575–0.8574); and the multilayer perceptron (MLP) model (Figure 3F) with a test AUC of 0.835 (95% CI 0.7887–0.8822). A detailed performance description for each ML model is supplemented in Table 5.
Figure 3. Assessment of the AUC values for multiple models. (A–F) Line plot represented the ROC curves for different models. X-axis is specificity, and Y-axis is sensitivity. Red solid line and blue dashed line represented train and test AUC, respectively.
Then we performed calibration analysis for each model, the calibration curves indicated that Logistic Regression (LR) demonstrated nearly ideal calibration, while LightGBM and Xgboost also showed reasonable calibration performance. In contrast, the calibration curves for SVM and MLP exhibit minor deviations from the ideal line, which remain within acceptable limits and do not substantially affect the clinical interpretability of the predictions (Figure 4A). Finally, we included a decision-curve analysis (DCA) to evaluate clinical benefit at different thresholds. This analysis allows for a direct comparison of the net clinical benefit across a range of threshold probabilities, providing a crucial perspective on the model's value in clinical decision-making beyond traditional discrimination metrics (Figure 4B).
Figure 4. Assessment of the five models. (A) Line plot showing the calibration analysis results for each model. (B) Line plot showing the decision-curve analysis results to evaluate clinical benefit for each model.
Finally, we developed a simplified, points-based risk scoring system for potential bedside application based on the relative SHAP importance using the LightGBM model, which showed higher AUC score than other models. Using a pre-specified threshold of 3.46 points (derived from the risk distribution in our cohort), patients can be stratified into a Low-Risk group (score ≤3.46) and a High-Risk group (score >3.46) (Figure 5A). This simplified model, despite its ease of use, retained a clinically acceptable discriminative ability with an AUC of 0.75 on our validation set (Figure 5B).
Figure 5. Construction of a simplified, points-based risk scoring system. (A) The points-based risk scoring system for potential bedside application. (B) Line plot represented the ROC curve for the system.
Discussion
Major findings
In this study, we developed a predictive model for post-ERCP pancreatitis using ML techniques, employing stepwise regression for feature selection. Based on stringent inclusion and exclusion criteria, we independently identified six factors showing highest correlation with outcomes. We consider periampullary diverticulum, pancreatic duct stent placement, more than one guidewire passage into the pancreatic duct, non-dilated extrahepatic bile duct, age, and coronary heart disease as the most significant clinical variables affecting the risk of PEP. In subsequent ML model development, all ML models outperformed the conventional LR model, with Support Vector Machines (SVM), Gradient Boosting Trees (GBT), and Multi-Layer Perceptron (MLP) models achieving acceptable or superior performance. Figure 6 depicts the overall research process.
Comparison with the current models
Post-ERCP pancreatitis (PEP) represents one of the most frequent and severe complications associated with ERCP (16). This complication can substantially diminish patients' quality of life, augment healthcare expenditures, and in its gravest manifestations, result in patient mortality (17). Although a multitude of predictive models have been developed that demonstrate adept performance, their implementation is often impeded by their complexity, which hinders their integration into routine clinical practice. Furthermore, there may be an insufficient level of recognition among younger endoscopists regarding the potential risk of PEP occurrence. Therefore, it is necessary to develop a prediction model with acceptable accuracy and relatively straightforward predictors for clinical use. In 2021, Koichi Fujita et al. created a practical scoring system for PEP based on a multicenter study in Japan. Considering the weight of seven predictive factors, they developed a model with an acceptable fit and accuracy (AUROC = 0.791) (18). In 2022, Chan Hyuk Park et al. proposed a model based on high-risk factors before and after ERCP but failed to validate the effectiveness of their preoperative risk prediction model (19). Recently, our group used a logistic regression-based backward algorithm to select predictive factors for the establishment of traditional and machine learning models for PEP. Our machine learning model showed better performance and prospects for clinical application than previous models. Traditional multivariate regression is susceptible to small sample bias, particularly in the context of low probability events such as PEP. This approach may also exhibit limited generalization capabilities when addressing complex nonlinear relationships and instances where positive event samples are scarce. To circumvent these constraints, employing distinct ML algorithm-based models using preoperative and postoperative data can potentially enhance predictive performance and surpass the limitations inherent in existing models (20).
ML model performance
ML models have been widely applied in the medical field due to their advantages in handling complex nonlinear relationships and high-dimensional data (21). Substantial literatures have discussed the establishment of ML models related to acute pancreatitis. Our group also paid attention to the work published by Livia Archibugi and colleagues in May 2023, which may be the first discussion on using ML to predict the risk of PEP. They innovatively used the SHapley Additive exPlanations (SHAP) method to open the “black box” of algorithms and study how each feature contributes to the model. Unfortunately, neither traditional nor ML models showed AUROC curve values that could be referenced and applied clinically (Gradient Boosting = 0.67, LR = 0.56). They suggested that this might be related to insufficient model training due to data imbalance caused by the low incidence rate (6%) of PEP (22). After augmenting the training set data using the ADASYN method, all models satisfactorily predicted the occurrence of postoperative pancreatitis in the validation set, with XGBoost showing the best AUC: 0.840 (95% Cl 0.7955–0.8851). Meanwhile, the above results also need to be further validated by statistical comparison such as DeLong test using bigger sample size in future. We recognize that ADASYN may induce the risk of overfitting, leading to forecast deviation. Meanwhile, it can indeed effectively improve the performance of classifiers and neural networks when the dataset is highly imbalanced; therefore, it is feasible to use these data to predict PEP events. That is, endoscopists can achieve dynamic prediction based on these features and decide on perioperative strategies, allowing nursing decisions to be more precise and personalized.
Features for PEP prediction
After ERCP, the interpretation and communication of ML models pose a challenge due to the existence of algorithmic “black boxes”. In this investigation, a model-agnostic interpretive approach was employed to discern the underlying clinical elements that may contribute to the onset of post-endoscopic pancreatitis in patients. Using a sequence backward stepwise regression algorithm, we are instrumental in predicting post we identified key clinical variables that are instrumental in predicting PEP. These variables include the presence of periampullary diverticulum, placement of a pancreatic duct stent, the frequency of guidewire cannulation into the pancreatic duct, non-dilated extrahepatic bile ducts, patient age, and the presence of coronary heart disease. Notably, pancreatic duct stent placement, guidewire passages, non-dilated extrahepatic bile ducts, and patient age are all recognized as either risk or protective factors according to the guidelines set forth by the European Society of Gastrointestinal Endoscopy (ESGE) or the American Society for Gastrointestinal Endoscopy (ASGE) (7, 10, 23). NGP has been used as an independent risk factor for PEP in one recent study (24). In another study, the non-dilated extrahepatic bile ducts factor was found to be a significant predictor for PEP using the comprehensive systematic review and meta-analysis method (25). Periampullary diverticulum may be somewhat controversial. From the data collected, most of the patients with periampullary diverticulum (PAD) included in our study had intra-diverticular papilla or a papilla less than 2 cm from the diverticulum. These diverticula could potentially affect the sphincter of Oddi, which normally serves as the main valve for the pancreaticobiliary tract. In patients with PAD, the presence of a diverticulum often causes the papilla to lose its normal morphology partially or completely, leading to variations in the direction of the pancreaticobiliary ducts (26–28). Additionally, the lack of duodenal smooth muscle support at the site of the diverticulum can lead to reduced tension in the distal walls of the pancreaticobiliary ducts, diminished emptying capacity, and post-sphincterotomy tissue edema exacerbates this condition (29). A compelling piece of evidence is that sphincter of Oddi dysfunction (SOD) has been identified as an important and independent risk factor for PEP in most studies (10). However, regarding coronary heart disease (CHD), current guidelines do not indicate it as an independent protective or risk factor for PEP. Patients with CHD are often on long-term medications, including aspirin. Yet, in our data, the group with CHD exhibited a lower incidence of PEP, aligning with the findings of a study published by Harsh K. Patel et al. They compared 1,374,773 ERCP procedures and found a lower incidence of PEP in patients with a history of myocardial infarction (MI) or coronary revascularization surgery (PCI or CABG) (14.1% vs. 15.4%, p < 0.001), though they did not discuss the reasons for this finding in their conclusion (30). In many cases, patients undergoing urgent ERCP for conditions such as common bile duct stones or suppurative cholangitis may not switch to heparin bridging therapy in a timely manner. It is speculated that the lower incidence rate of PEP observed in this patient group could be attributed to the protective effect of nonsteroidal anti-inflammatory drugs (NSAIDs) administered preoperatively to those with CHD against PEP. This reflects the interaction between model variables, and since the Backward selection algorithm is generally more flexible in variable selection, considering more interactions and nonlinear relationships, it is better equipped to capture the complex influences and obtain solid association between CHD and other variables in future (31). It also reminds us that when assessing a patient's PEP risk, it may be necessary to consider the impact of medications routinely taken by the patient on PEP, which could affect the choice of preventive measures and treatment strategies (32).
Clinical implication and future application
Amidst the extensive discourse on the application and dosage of NSAIDs and other potential prophylactic medications for PEP, as proposed by various clinical guidelines—including those from ESGE, ASGE, and the Chinese Guideline for ERCP (2018 Edition)—it is imperative to consider the evidence underpinning these recommendations. it is questionable how many patients undergo a comprehensive PEP risk assessment preoperatively (2, 10, 33, 34). Therefore, the development of a streamlined predictive model for this population is worthy considering for endoscopists. Initially, a thorough evaluation based on the patient's physical condition and radiological findings should be conducted by endoscopists preoperatively. Patients with pre-existing CHD often took aspirin, which, although found to reduce the incidence of PEP in our data, increases the risk of bleeding due to the continued use of anticoagulants when the sphincterotomy is performed during the ERCP procedure (3, 19). Furthermore, if a periampullary diverticulum is discovered during the procedure, particularly Li-Tanaka type I or II diverticula that severely affect the papillary morphology (35), the relationship between the opening of the papilla and the axis of the bile duct should be carefully considered to choose an appropriate cannulation method to reduce the incidence of postoperative complications (36). In this study, we have identified some unique factors for early identification of PEP, which can provide guidance for future multicenter clinical trials. In future multicenter experiments, with independent validation and more case data, a scoring system for PEP can be established. Based on the confirmed model, we could develop an online risk assessment tool in the future to estimate the risk of PEP in ERCP patients (37). In summary, if patients present with the high-risk factors mentioned above, clinicians should take corresponding measures to improve patient outcomes and expedite discharge.
Strengths and limitations
This investigation exhibited several notable strengths that warrant emphasis. Firstly, the study implemented a rigorous modeling procedure that encompassed stringent criteria for inclusion and exclusion, exhaustive data processing, meticulous feature selection, model construction, and assessment. This methodological approach yielded a predictive model characterized by streamlined factors yet relatively practical outcomes. Meanwhile, a checklist of reporting guidelines such as transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) should be performed to improve reproducibility (38). Secondly, this study applied ML methods for the first time to validate the filtered model features, offering higher predictive accuracy than traditional linear models even when dealing with complex data patterns and associations. However, there are several limitations to acknowledge. Primarily, this study utilized a retrospective single-center cohort study due to the limitation of time and resource, limiting the generalizability of this model to other regions due to the differences in patient demographics or procedural practices. Despite the strictest inclusion and exclusion criteria, the small sample size necessitates prospective observational studies for more rigorous validation. The smaller sample size also poses the risk of overfitting in ML models. Secondly, due to the low incidence of PEP postoperatively at our institution, there is a significant imbalance between the samples of PEP patients and non-PEP patients. For model evaluation, precision, recall, and other metrics of the predicted model at different thresholds should be reported besides the AUC assessment. This study relies on stepwise logistic regression for feature screening, which may miss prediction patterns that can only be recognized by nonlinear models (such as high-order interaction terms or threshold effects). Although the final XGBoost model can still partially compensate for such omissions through tree structure, future research should explore ML methods based on regularization to better adapt to the needs of complex models. The interpretability tools for ML models such as SHAP or LIME also should be considered to confirm the model's clinical decision-making basis. Meanwhile, the information about drugs usage was not fully considered in this study because of the limitation of clinical information of patients. Although an adaptive synthetic sampling algorithm is used to balance the sample sizes, external validation is still required in future research. In summary, these limitations call for a large-scale and multicenter cohort study in future to validate this model.
Conclusion
In conclusion, we developed and validated a streamlined predictive model for PEP, enhancing our understanding of PEP risk factors in our population. The XGBoost and MLP models outperformed other algorithms, highlighting key preoperative and intraoperative variables. These findings can be used to construct specific clinical application scenarios or tool to guide endoscopists in optimizing clinical outcomes for patients with biliary and pancreatic conditions.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.
Ethics statement
The studies involving humans were approved by Ningxia Medical University General Hospital Medical Research Ethics Review Committee with Clinical trial number: 2020-633. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
TD: Conceptualization, Data curation, Investigation, Validation, Writing – original draft. GD: Writing – review & editing. HY: Writing – review & editing. HW: Writing – review & editing. WW: Writing – review & editing. TM: Writing – review & editing. JM: Writing – review & editing. HW: Writing – review & editing. QW: Writing – review & editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This research was supported by the Ningxia Autonomous Region Key R&D Program Project (2021BEG02038).
Acknowledgments
We thank the collaborators for their contribution and involvement in the study.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
2. Testoni PA, Testoni S, Giussani A. Difficult biliary cannulation during ERCP: how to facilitate biliary access and minimize the risk of post-ERCP pancreatitis. Dig Liver Dis. (2011) 43:596–603. doi: 10.1016/j.dld.2011.01.019
3. Syren E, Eriksson S, Enochsson L, Eklund A, Sandblom G. Risk factors for pancreatitis following endoscopic retrograde cholangiopancreatography. BJS Open. (2019) 3:485–9. doi: 10.1002/bjs5.50162
4. Yan YD, Yu Z, Ding LP, Zhou M, Zhang C, Pan MM, et al. Machine learning to dynamically predict in-hospital venous thromboembolism after inguinal hernia surgery: results from the CHAT-1 study. Clin Appl Thromb Hemost. (2023) 29:10760296231171082. doi: 10.1177/10760296231171082
5. Nemoto M, Masutani Y, Nomura Y, Hanaoka S, Miki S, Yoshikawa T, et al. [Machine learning for computer-aided diagnosis]. Igaku Butsuri. (2016) 36:29–34. doi: 10.11323/jjmp.36.1_29
6. Bang JY, Varadarajulu S. Pediatrics: ERCP in children. Nat Rev Gastroenterol Hepatol. (2011) 8:254–5. doi: 10.1038/nrgastro.2011.63
7. Cotton PB. ASGE guidelines for ERCP competence. Gastrointest Endosc. (2017) 86:1190. doi: 10.1016/j.gie.2017.07.008
8. Sureka B, Bansal K, Patidar Y, Arora A. Imaging lexicon for acute pancreatitis: 2012 Atlanta classification revisited. Gastroenterol Rep (Oxf). (2016) 4:16–23. doi: 10.1093/gastro/gov036
9. Williams EJ, Ogollah R, Thomas P, Logan RF, Martin D, Wilkinson ML, et al. What predicts failed cannulation and therapy at ERCP? Results of a large-scale multicenter analysis. Endoscopy. (2012) 44:674–83. doi: 10.1055/s-0032-1309345
10. Testoni PA, Mariani A, Aabakken L, Arvanitakis M, Bories E, Costamagna G, et al. Papillary cannulation and sphincterotomy techniques at ERCP: European society of gastrointestinal endoscopy (ESGE) clinical guideline. Endoscopy. (2016) 48:657–83. doi: 10.1055/s-0042-108641
11. Boos J, Yoo RJ, Steinkeler J, Ayata G, Ahmed M, Sarwar A, et al. Fluoroscopic percutaneous brush cytology, forceps biopsy and both in tandem for diagnosis of malignant biliary obstruction. Eur Radiol. (2018) 28:522–9. doi: 10.1007/s00330-017-4987-5
12. Itoi T, Wang HP. Endoscopic management of bile duct stones. Dig Endosc. (2010) 22(Suppl 1):S69–75. doi: 10.1111/j.1443-1661.2010.00953.x
13. Ahmed G, Er MJ, Fareed MMS, Zikria S, Mahmood S, He J, et al. DAD-Net: classification of Alzheimer’s disease using ADASYN oversampling technique and optimized neural network. Molecules. (2022) 27:1–21. doi: 10.3390/molecules27207085
14. Guo CY, Chou YC. A novel machine learning strategy for model selections—stepwise support vector machine (StepSVM). PLoS One. (2020) 15:e0238384. doi: 10.1371/journal.pone.0238384
15. Livingston E, Cao J, Dimick JB. Tread carefully with stepwise regression. Arch Surg. (2010) 145:1039–40. doi: 10.1001/archsurg.2010.240
16. Tryliskyy Y, Bryce GJ. Post-ERCP pancreatitis: pathophysiology, early identification and risk stratification. Adv Clin Exp Med. (2018) 27:149–54. doi: 10.17219/acem/66773
17. Baillie J. Management of post-ERCP pancreatitis. Gastroenterol Hepatol (N Y). (2011) 7:390–2.21869870
18. Fujita K, Yazumi S, Uza N, Kurita A, Asada M, Kodama Y, et al. New practical scoring system to predict post-endoscopic retrograde cholangiopancreatography pancreatitis: development and validation. JGH Open. (2021) 5:1078–84. doi: 10.1002/jgh3.12634
19. Park CH, Park SW, Yang MJ, Moon SH, Park DH. Pre- and post-procedure risk prediction models for post-endoscopic retrograde cholangiopancreatography pancreatitis. Surg Endosc. (2022) 36:2052–61. doi: 10.1007/s00464-021-08491-1
20. Baştanlar Y, Ozuysal M. Introduction to machine learning. Methods Mol Biol. (2014) 1107:105–28. doi: 10.1007/978-1-62703-748-8_7
21. Badillo S, Banfai B, Birzele F, Davydov I, Hutchinson L, Kam-Thong T, et al. An Introduction to machine learning. Clin Pharmacol Ther. (2020) 107:871–85. doi: 10.1002/cpt.1796
22. Archibugi L, Ciarfaglia G, Cardenas-Jaen K, Poropat G, Korpela T, Maisonneuve P, et al. Machine learning for the prediction of post-ERCP pancreatitis risk: a proof-of-concept study. Dig Liver Dis. (2023) 55:387–93. doi: 10.1016/j.dld.2022.10.005
23. Committee ASOP, Buxbaum JL, Abbas Fehmi SM, Sultan S, Fishman DS, Qumseya BJ, et al. ASGE guideline on the role of endoscopy in the evaluation and management of choledocholithiasis. Gastrointest Endosc. (2019) 89:1075–105.e15. doi: 10.1016/j.gie.2018.10.001
24. Goenka MK, Akshintala VS, Kamal A, Bhullar FA, Bush N, Kumar V, et al. Frequent guidewire passage into the pancreatic duct is an independent risk factor for postendoscopic retrograde cholangiopancreatography (ERCP) pancreatitis (PEP) among high-risk individuals: a post-hoc analysis of a randomized controlled trial data. J Dig Dis. (2023) 24:427–33. doi: 10.1111/1751-2980.13208
25. Beran A, Aboursheid T, Ali AH, Nayfeh T, Albunni H, Vargas A, et al. Predictors of post-endoscopic retrograde cholangiopancreatography pancreatitis: a comprehensive systematic review and meta-analysis. Clin Gastroenterol Hepatol. (2024).39694210
27. Zippi M, Traversa G, Pica R, De Felici I, Cassieri C, Marzano C, et al. Efficacy and safety of endoscopic retrograde cholangiopancreatography (ERCP) performed in patients with periampullary duodenal diverticula (PAD). Clin Ter. (2014) 165:e291–4. doi: 10.7417/CT.2014.1745
28. Chowdhury MZI, Turin TC. Variable selection strategies and its importance in clinical prediction modelling. Fam Med Community Health. (2020) 8:e000262. doi: 10.1136/fmch-2019-000262
29. Romagnuolo J. Recent research on sphincter of oddi dysfunction. Gastroenterol Hepatol (N Y). (2014) 10:441–3.25904832
30. Patel HK, Desai R, Doshi S, Haider M, Lakhani N, Abu Hassan F, et al. Endoscopic retrograde cholangiopancreatography in patients with versus without prior myocardial infarction or coronary revascularization: a nationwide cohort study. Cureus. (2021) 13:e13921. doi: 10.7759/cureus.13921
31. Liu H, Jiang H, Zheng R. The hybrid feature selection algorithm based on Maximum Minimum backward selection search strategy for liver tissue pathological image classification. Comput Math Methods Med. (2016) 2016:7369137. doi: 10.1155/2016/7369137
32. Boskoski I, Costamagna G. How to prevent post-endoscopic retrograde cholangiopancreatography pancreatitis. Gastroenterology. (2020) 158:2037–40. doi: 10.1053/j.gastro.2020.03.019
33. Testoni PA. Therapy: can rectal NSAIDs prevent post-ERCP pancreatitis? Nat Rev Gastroenterol Hepatol. (2012) 9:429–30. doi: 10.1038/nrgastro.2012.117
34. Thiruvengadam NR, Kochman ML. Emerging therapies to prevent post-ERCP pancreatitis. Curr Gastroenterol Rep. (2020) 22:59. doi: 10.1007/s11894-020-00796-w
35. Yue P, Zhu KX, Wang HP, Meng WB, Liu JK, Zhang L, et al. Clinical significance of different periampullary diverticulum classifications for endoscopic retrograde cholangiopancreatography cannulation. World J Gastroenterol. (2020) 26:2403–15. doi: 10.3748/wjg.v26.i19.2403
36. Cunningham JT. The art of selective cannulation at ERCP. Curr Gastroenterol Rep. (2019) 21:7. doi: 10.1007/s11894-019-0673-x
37. Anderloni A. Biliary cannulation in ERCP: you don’t need to be a shark if you now can be sharp! Endoscopy. (2023) 55:1043–4. doi: 10.1055/a-2164-9565
Keywords: post-ERCP pancreatitis, artificial intelligence, predictive model, machine learning, risk prediction
Citation: De T, Du G, Yin H, Wang H, Wang W, Ma T, Ma J, Wang H and Wang Q (2025) Development and validation of a practical prediction model for post-ERCP pancreatitis using machine learning. Front. Surg. 12:1628956. doi: 10.3389/fsurg.2025.1628956
Received: 15 May 2025; Accepted: 3 October 2025;
Published: 3 November 2025.
Edited by:
Wandong Hong, First Affiliated Hospital of Wenzhou Medical University, ChinaReviewed by:
Livia Archibugi, San Raffaele Hospital (IRCCS), ItalyMd.Enamul Hoq, University of Arkansas for Medical Sciences, United States
Muhammad Daniyal Waheed, Maroof International Hospital, Pakistan
Copyright: © 2025 De, Du, Yin, Wang, Wang, Ma, Ma, Wang and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Qi Wang, d3EtNjU2MkAxNjMuY29t; Hao Wang, d2FuZ2hhb2dyYWR1YXRlQDEyNi5jb20=
Tianyu De1