- 1Department of Magnetic Resonance Imaging, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, China
- 2School of Physical Education, Southwest Medical University, Luzhou, Sichuan, China
- 3The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, China
- 4Department of Nuclear Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- 5Department of Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, The Affiliated Hospital of Southwest Medical University, Luzhou, China
Background: Pancreatic cancer (PC) and pancreatitis—encompassing acute, chronic, autoimmune, and other inflammatory pancreatic conditions—often exhibit overlapping clinical and imaging features, yet require fundamentally different therapeutic strategies. This similarity frequently leads to diagnostic uncertainty in routine clinical practice. Image-based artificial intelligence (AI) has emerged as a promising tool to enhance diagnostic accuracy. This meta-analysis systematically evaluates the diagnostic performance of AI algorithms in differentiating PC from pancreatitis.
Methods: A systematic literature search of PubMed, Embase, and Cochrane Library databases was conducted for studies published through June 30 2025. Eligible studies reporting AI diagnostic performance metrics were selected. Methodological rigor was assessed using the modified Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Pooled sensitivity (SEN), specificity (SPE), positive/negative likelihood ratios (+LR/-LR), diagnostic odds ratio (DOR), and summary receiver operating characteristic (SROC) curves were derived using Stata 17.0 software.
Results: Twenty-five eligible studies (3279 patients) were ultimately eligible for data extraction, of which sixty-eight tables were included in this meta-analysis. The pooled SEN was 89% (95% CI: 87–90%), SPE was 88% (95% CI: 86–90%), and AUC was 0.94 (95% CI: 0.92–0.96) in 28 included studies with 76 contingency tables, however, substantial heterogeneity was observed among the included studies, with I² = 77.14% in SEN and I² = 75.61% in SPE. The pooled SEN and SPE were 91% (95% CI: 88–93%) and 90% (95% CI: 87–93%), with an AUC of 0.96 (95% CI: 0.94–0.97) in 28 included studies with 28 best diagnosis performance tables. Analysis for different algorithms revealed a pooled SEN of 89% (95%CI: 86−90%) and SPE of 88% (95%CI: 86−90%) for machine learning, and a pooled SEN of 89% (95%CI: 82−93%) and SPE of 85% (95%CI: 76−91%) for deep learning. Subsequent subgroup analysis suggested that part of the heterogeneity might be explained by differences in Algorithm, Imaging Modality, Publication Geographical, and Year of publication.
Conclusion: AI-based image analysis demonstrates strong diagnostic performance in distinguishing PC from pancreatitis, exceeding thresholds typically achieved with conventional imaging alone. These findings support the potential integration of AI into clinical decision-support workflows to improve the preoperative evaluation of pancreatic lesions.
Systematic review registration: https://www.crd.york.ac.uk/prospero/, identifier CRD42024529580.
1 Introduction
Pancreatic cancer (PC), characterized by its aggressive biological behavior, rapid progression, and dismal prognosis, remains one of the most lethal malignancies worldwide. While surgical resection represents the sole curative option for patients with PC (1, 2), its clinical management is complicated by the diagnostic challenge of distinguishing it from pancreatitis such as mass-forming pancreatitis (MFP), mass-forming chronic pancreatitis (MFCP), autoimmune pancreatitis (AIP), focal forms of chronic pancreatitis, and other focal inflammatory lesions. These two entities share overlapping clinical presentations (including weight loss, abdominal pain, and pancreatic insufficiency) and imaging features (3, 4), yet demand diametrically opposed treatment strategies: pancreatitis typically responds to medical therapy whereas PC requires radical resection. Accurate preoperative differentiation is therefore critical to avoid two detrimental scenarios—delayed intervention in early-stage PC (potentially compromising survival outcomes) and unnecessary pancreatic surgery in pancreatitis cases (5–8).
Imaging diagnosis plays a pivotal role in this differential diagnostic process (9, 10). Conventional radiographic evaluation relies heavily on qualitative assessment of morphological features, yet modern imaging techniques [such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET)] generate vast amounts of quantitative data that may provide greater diagnostic value than traditional imaging biomarkers in diagnostic value (11–14). Current clinical practice, however, remains constrained by interobserver variability and a reliance on radiologists’ subjective experience in evaluating heterogeneous pancreatic lesions (15, 16).
The emergence of artificial intelligence (AI) has introduced transformative potential in oncological imaging. Contemporary research demonstrates AI’s capacity to extract high-throughput quantitative features from medical images, including subvisual patterns imperceptible to human observers—thereby enhancing diagnostic precision and prognostic stratification (17–19). Technological advancements in computational power, algorithm optimization, and the availability of large-scale imaging datasets have accelerated AI applications via multimodal data integration (encompassing radiological, histopathological, genomic, and metabolic information) to refine clinical decision-making (20–22). Broadly, image-based AI methods can be categorized into radiomics-based machine learning (ML) and deep learning (DL). Radiomics-based ML requires manual or semi-automatic segmentation of lesions, followed by extraction of handcrafted quantitative features and classification using algorithms such as support vector machines (SVMs), random forests (RFs), or LASSO regression. By contrast, DL methods such as convolutional neural networks (CNNs) or deep learning radiomics (DLR) automatically learn and optimize image features in an end-to-end manner with minimal manual input. In our systematic review, the majority of included studies adopted radiomics-based ML approaches, whereas a smaller subset used DL architectures. This context provides an essential framework for interpreting the pooled diagnostic performance reported in the following sections. Notably, evidence from multiple comparative studies indicates that AI systems can perform at least comparably to radiologists, with some reports showing similar or occasionally superior accuracy in differentiating PC from pancreatitis across different imaging modalities (23). Nevertheless, a comprehensive quantitative synthesis of this evidence remains lacking (24, 25).
In this study, we conducted a comprehensive review and meta-analysis of published diagnostic data in order to better understand how well AI algorithms and models perform in the differential diagnosis of PC from pancreatitis and explored the clinical applicability.
2 Materials and methods
This systematic review was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-analysis for Diagnostic Test Accuracy (PRISMA-DTA) guidelines (26). The protocol of this meta-analysis was registered and is available at PROSPERO (No. CRD42024529580). Literature search, data extraction, and quality assessment were performed independently by two reviewers (H.Y. Zhang and J. Lu).
2.1 Literature search and selection
PubMed, Embase, and the Cochrane Library databases were systematically searched for studies published until 30 June 2025. The search was limited to English-language publications. A detailed search strategy was tailored for each database, using a combination of Medical Subject Headings (MeSH) terms and free-text keywords. The search strategy included the following subject headings and abstract words: artificial intelligence, radiomics, deep learning, machine learning, pancreas, pancreatic cancer, pancreatic adenocarcinoma, mass-forming pancreatitis, autoimmune pancreatitis, and focal chronic pancreatitis. The search strategy was: (“radiomics” OR “deep learning” OR “machine learning” OR “artificial intelligence”) AND (“pancreatic cancer” OR “pancreatic adenocarcinoma”) AND (“mass-forming pancreatitis” OR “autoimmune pancreatitis” OR “focal chronic pancreatitis”). Boolean operators (“AND”, “OR”) and proximity operators were used to broaden the search and capture variations in terminology. The complete electronic search strategies for all databases, including MeSH terms, keywords, Boolean operators, and date limits, are provided in Supplementary Table S1 for transparency and reproducibility. The strategy was adjusted to accommodate each database’s specific indexing systems and syntaxes. Following deduplication, two investigators independently screened titles and abstracts. Articles deemed potentially relevant underwent full-text review against predefined inclusion criteria. Studies satisfying these criteria were incorporated into qualitative synthesis and meta-analyses. Additionally, backward citation tracking of included articles and related systematic reviews was performed to identify supplementary evidence. Throughout the screening procedure, the corresponding author was consulted in order to settle any disputes among the authors on research selection.
2.2 Inclusion criteria
The inclusion criteria were designed based on the following PICOS criteria to ensure comprehensive and unbiased searches as follows:
● P (Population) and S (Study): patients with PC [including pancreatic cancer (PC) or pancreatic ductal adenocarcinoma (PDAC)] or pancreatitis [including mass-forming pancreatitis (MFP), chronic pancreatitis (CP), autoimmune pancreatitis (AIP), and mass-forming chronic pancreatitis (MFCP), and other focal inflammatory lesions] confirmed by histological or clinical diagnoses.
● I (Intervention): AI-based diagnostic systems (e.g., ML models and DL algorithms) designed to differentiate PC from pancreatitis.
● C (Comparator): Histopathological confirmation (biopsy/surgical specimen) or comprehensive clinical diagnosis (imaging follow-up ≥ 12 months with multidisciplinary consensus) results were used as the reference standard to compare the performance of AI algorithms.
● O (Outcomes): Diagnostic accuracy for differentiating pancreatitis from PC. The performance of AI algorithms was assessed through key metrics, including the area under the curve (AUC), sensitivity, and specificity data of the models or the corresponding information for constructing a 2 × 2 matrix table.
Exclusion criteria were as follows: (a) studies involving patients who had received neoadjuvant therapy (including immunotherapy, radiotherapy, or chemotherapy) prior to diagnostic imaging; (b) studies failing to provide extractable data for reconstructing 2×2 contingency tables or explicit diagnostic validation protocols; (c) non-original research publications (case reports, systematic reviews, editorials, and letters) and conference abstracts without full methodological details; and (d) investigations exclusively focusing on technical validation phases such as image segmentation algorithms or radiomic feature extraction pipelines.
2.3 Data extraction and quality assessment
Two reviewers (H.Y. Zhang and J. Lu) independently extracted the data from the included studies. The following data were extracted from the eligible studies: (a) study characteristics (the first author, year of publication, country, and study design); (b) lesion characteristics and numbers of PC or pancreatitis; (c) AI algorithms and subtype (DL or ML); (d) images modalities (CT, MRI, ECT, and PET/CT); (e) the reference criteria for pancreatitis and PDAC (histopathological or clinical diagnoses); and (f) the diagnostic accuracy data including true-positive (TP), false-positive (FP), true-negative (TN), and false-negative (FN) were extracted directly into contingency tables, which were used to calculate SE and SP. If a study provided multiple contingency tables for the same or different AI algorithms, we assumed that they were independent of each other.
The quality of the reviewed studies was evaluated using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) criteria. The tool assesses four key domains: “Patient Selection”, “Index Test”, “Reference Standard”, and “Flow and Timing”. For each domain, two reviewers independently evaluated the risk of bias (high/low/unclear) and applicability concerns (high/low). Discrepancies were resolved through discussion or consultation with a third researcher.
2.4 Meta-analysis
Statistical analyses were conducted by using STATA 17.0 (StataCorp LLC, College Station, Texas, USA) using the bivariate random-effects model. Coupled forest plots were generated to visually present pooled sensitivity and specificity. The pooled sensitivity and specificity of the differential diagnosis of pancreatitis and PC, and their 95% confidence intervals (CIs) were calculated using the relevant data extracted from each individual study. The sensitivity was calculated by dividing the number of patients who had a PC diagnosis by the total number of patients with PC, and the specificity was calculated by dividing the number of patients who had an MPF by the total number of MPF patients. I2 values were computed to assess statistical heterogeneity, categorized as very low (0%–25%), low (25%–50%), medium (50%–75%), and high (>75%) [19,20]. Publication bias was determined by visual assessment of Deeks’ funnel plot, and statistical significance was evaluated by Deeks’ asymmetry test. p < 0.05 was considered statistically significant.
If significant heterogeneity presented, a subgroup analysis was conducted to investigate the cause of heterogeneity. The following covariates were used in the subgroup analysis: (a) subtype of AI algorithms (ML vs. DL); (b) imaging modalities (US, CT, MRI, or PET); (c) according to the pooled performance using the same dataset (AI algorithms or human clinicians); (d) according to the geographical distribution (Asia or non-Asia); (e) number of institutions from which data were obtained (single or multiple centers); (f) patient numbers (patients ≤100 vs. patients >100); (g) publication year (before 2020 vs. after 2020); (h) studies were categorized as low risk or high/unclear risk of bias, based on the overall judgment across the four domains (patient selection, index test, reference standard, and flow/timing); and (i) different AI algorithm (LASSO vs. SVM vs. ANNs).
2.5 Sensitive analysis
A sensitivity analysis was performed by excluding studies with high or unclear risk of bias to evaluate the influence of lower-quality studies on the overall pooled results. The pooled sensitivity, specificity, diagnostic odds ratio (DOR), and area under the summary receiver operating characteristic curve (AUC) were recalculated for the remaining studies The results were compared with the primary analysis to assess the stability and robustness of the findings.
3 Results
3.1 Study selection and characteristics of eligible studies
A total of 408 studies were retrieved during the initial search. After removing 119 duplicates, 135 studies were excluded based on title and abstract screening, leaving 94 studies eligible for full-text review. Ultimately, 28 articles were included in the systematic review, all of which provided sufficient data for meta-analysis. The final analysis incorporated data from 3,279 patients. A total of 76 contingency tables, including TP, FP, TN, and FN, were extracted from the 28 eligible studies (3, 24, 25, 27–49). The complete details of the literature search and screening process are illustrated in Figure 1 and Supplementary Table S1.
The included studies were published between 2001 and 2025, with a notable concentration of 20 publications (71.4%) between 2019 and 2023. The majority of the studies (n = 17) were conducted in China, while the remaining studies were carried out in America, Japan, Romania, Germany, and India. Among the included studies, 25 employed retrospective designs, and 3 utilized prospective designs. Twenty-two studies were conducted at a single center, while six involved multiple centers. Eight studies compared the performance of AI models with clinicians using the same dataset. Twenty-four studies explicitly excluded low-quality images, whereas the remaining studies did not mention this process.
The imaging modalities used in the studies were categorized as follows: CT (n = 11), MRI (n = 4), US (n = 9), and PET/CT (n = 4). Twenty-three studies analyzed retrospectively collected data, while the remaining three studies used prospectively collected data. In terms of AI algorithms, the distribution was as follows: ML (25 studies) and DL (3 studies). Eight studies compared the performance of AI with that of expert clinicians. Detailed characteristics of the included studies are presented in Tables 1, 2 and Supplementary Tables S2, S3.
3.2 Quality assessment
The quality of included studies was determined by the QUADAS-2 (Supplementary Figure S1). The detailed assessment results are presented in Supplementary Figure S2. Over half of the studies showed a high risk or an unclear risk of bias respectively for patient selections (n = 16) because these studies did not clearly describe patient characteristics, including prior testing, clinical presentation, study setting, intended use of the index test, or external evaluation. The risk of bias in the index test was low in 13 studies (46%) and high or unclear in 15 studies (54%). The risk of bias within the domains of reference standard tests was consistently low across all studies. Moreover, 15 studies remained unclear in the flow and timing domain, while 13 studies were low in this regard. The analysis revealed a potential risk of bias in the domain of patient selection, with the absence of randomization serving as a primary contributing factor. Overall, the included literature was deemed suitable for subsequent analyses in light of its quality.
3.3 Pooled performance of AI algorithms
The summary receiver operating characteristic (SROC) curves for the 28 included studies, comprising 76 contingency tables, are presented in Figures 2, 3. The pooled sensitivity (SEN) and specificity (SPE) were 89% (95% CI: 87%–90%) and 88% (95% CI: 86%–90%), respectively, with an AUC of 0.94 (95% CI: 0.92–0.96) for all AI algorithms. When the highest accuracy contingency table was selected from these 28 studies ( Table 3), the pooled SEN and SPE were 91% (95% CI: 88%–93%) and 90% (95% CI: 87%–93%), respectively, with an AUC of 0.96 (95% CI: 0.94–0.97), as shown in Figures 2, 4. However, substantial heterogeneity was observed among the included studies, with SEN having an I² = 77.14% and SPE having an I² = 75.61%.
Figure 2. (a) Summary receiver operating characteristic (SROC) curve (28 studies with 76 tables). (b) SROC curves of studies when selecting contingency tables reporting the highest accuracy (28 studies with 28 best values).
Figure 3. Forest plot of studies when selecting contingency tables reporting the Sen and Spe (28 studies with 76 best values).
Figure 4. Forest plot of studies when selecting contingency tables reporting the Sen and Spe (28 studies with 28 best values).
Table 3. The best diagnostic performance of image-based AI algorithm for differentiating PC from pancreatitis.
3.4 Publication bias
Funnel plots indicated no significant publication bias among the 28 included studies with 76 contingency tables (p = 0.18) and when the highest accuracy contingency table was selected (p = 0.70), as illustrated in Figures 5a, b.
Figure 5. (a) Funnel plots of the 28 studies with 76 tables. (b) Funnel plots of studies when selecting contingency tables reporting the highest accuracy (28 studies with 28 best values).
3.5 Subgroup and sensitive analyses
The detailed results of the subgroup analyses and exploration of potential sources of between-study heterogeneity are shown in Table 4 and Supplementary Figures S3–S20. Considering the developmental stage and inherent differences of the algorithms, we categorized them into ML and DL groups and performed subgroup analyses. In the DL group, the pooled SEN and SPE were 89% (95% CI: 82%–93%) and 85% (95% CI: 76%–91%), respectively. In the ML group, the pooled SEN and SPE were 89% (95% CI: 86%–90%) and 88% (95% CI: 86%–90%), respectively (Supplementary Figures S3a, b, S12). For different imaging modality, nine US studies had a pooled SEN of 90% (95% CI: 88%–92%), a pooled SPE of 88% (95% CI: 83%–91%), and an AUC of 0.94 (95% CI: 0.92–0.96). Eleven CT studies had a pooled SEN of 89% (95% CI: 86%–92%), a pooled SPE of 90% (95% CI: 86%–93%), and an AUC of 0.96 (95% CI: 0.93–0.97). Four MRI studies had a pooled SEN of 88% (95% CI: 83%–92%), a pooled SPE of 86% (95% CI: 82%–89%), and an AUC of 0.89 (0.86–0.91). Four PET/CT studies had a pooled SEN of 85% (95% CI: 81%–87%), a pooled SPE of 86% (95% CI: 74%–93%), and an AUC of 0.87 (95% CI: 0.84–0.90) (Supplementary Figures S4a–d, S13). In eight studies using the same databases, AI had a pooled SEN of 88% (95% CI: 85%–91%), an SPE of 88% (95% CI: 85%–90%), and an AUC of 0.93 (95% CI: 0.91–0.95); however, clinician had a pooled SEN of 77% (95% CI: 66%–85%), an SPE of 80% (95% CI: 71%–87%), and an AUC of 0.85 (95% CI: 0.82–0.88) (Supplementary Figures S5a, b, S14). Twenty-one studies were conducted in Asia, and seven were conducted outside Asia. The pooled SEN and SPE were 88% (95% CI: 86%–90%) and 88% (95% CI: 85%–90%), respectively, with an AUC of 0.94 (95% CI: 0.92–0.96) in the Asia group. The pooled SEN and SPE were 90% (95% CI: 88%–91%) and 89% (95% CI: 85%–92%), respectively, with an AUC of 0.92 (95% CI: 0.90–0.94) (Supplementary Figures S6a, b, S15). Twenty-two studies had single-center data and six studies had multiple-center data. The pooled SEN and SPE were 88% (95% CI: 86%–90%) and 88% (95% CI: 85%–90%), respectively, with an AUC of 0.94 (95% CI: 0.92–0.96) in the single-center group. The pooled SEN and SPE were 91% (95% CI: 86%–94%) and 88% (95% CI: 84%–92%), respectively, with an AUC of 0.95 (95% CI: 0.92–0.96) (Supplementary Figures S7a, b S16). Fourteen studies had sample sizes ≤100 and 14 studies had sample sizes >100. The pooled SEN and SPE were 87% (95% CI: 85%–90%) and 88% (95% CI: 84%–91%), respectively, and the AUC was 0.94 (95% CI: 0.91–0.95) for ≤100. For sample size >100, the pooled SEN and SPE were 90% (95% CI: 87%–92%) and 88% (95% CI: 85%–90%), respectively, for >100 with an AUC of 0.95 (95% CI: 0.92–0.96) (Supplementary Figures S8a, b S17). Seven studies were published before 2020 and 21 studies were published after 2020. The pooled SEN was 90% (95% CI: 86%–93%) for studies published before 2020, and 90% (95% CI: 86%–93%) for those published after 2020. The SPE was 88% (95% CI: 86%–90%) and 88% (95% CI: 85%–90%), respectively. The AUC was 0.95 (95% CI: 0.93–0.97) and 0.94 (95% CI: 0.92–0.96), respectively (Supplementary Figures S9a, b S18). Nine studies were low and 19 studies were high and unclear. The pooled SEN was 89% (95% CI: 85%–91%) for the high and unclear group, and 89% (95% CI: 84%–93%) for the low group. The SPE was 88% (95% CI: 86%–90%) and 88% (95% CI: 85%–90%), respectively. The AUC was 0.95 (95% CI: 0.92–0.96) and 0.94 (95% CI: 0.92–0.94), respectively (Supplementary Figures S10a, b S19). Six studies used LASSO, six used SVM, and five used ANNs. The pooled SEN and SPE were 86% (95% CI: 83–89%) and 87% (95% CI: 84–90%), respectively, for the LASSO group (AUC 0.92, 95% CI: 0.90–0.94). For the SVM group, the pooled SEN and SPE were 95% (95% CI: 90–98%) and 87% (95% CI: 82–94%), respectively (AUC 0.97, 95% CI: 0.95–0.98). For the ANN group, the pooled SEN and SPE were 88% (95% CI: 84–91%) and 84% (95% CI: 73–91%), respectively (AUC 0.90, 95% CI: 0.87–0.93) (Supplementary Figures S11a–c S20). Sensitive analysis results are shown in Supplementary Table S5.
4 Discussion
Focal inflammatory pancreatic lesions (FIPLs) are localized inflammatory masses arising from acute or chronic pancreatitis and often mimic PC on imaging, leading to diagnostic uncertainty. Accurately distinguishing PC from FIPLs remains clinically important because the two conditions require fundamentally different management strategies. This study aims to comprehensively assess the use of AI algorithms for differentiating PC from pancreatitis. Although the clinical symptoms of PC and pancreatitis patients are similar, their treatment needs and survival rates differ. Therefore, it is necessary for clinicians to differentiate between PC and pancreatitis. A large number of studies have been conducted in the last decade to differentiate PC from pancreatitis by extracting and analyzing imaging features and conduct AI algorithms due to the rapid growth of AI technology. We conducted this study to provide the highest level of evidence to assess the feasibility of AI to promote precision medicine in PC treatment. To our knowledge, this is the first systematic review and meta-analysis of the diagnostic performance of AI algorithms in this field. After a careful selection of research on relevant topics, we found that AI algorithms excelled in differentiating PC from pancreatitis using medical radiography imaging with a pooled SEN of 89% (95% CI: 87%–90%), a pooled SPE of 88% (95% CI: 86%–90%), and an AUC of 0.94 (95% CI: 0.92–0.96) in 28 included studies with 76 contingency tables, which manifested better performance than independent detection with a pooled SEN of 91% (95% CI: 88%–93%), a pooled SPE of 90% (95% CI: 87%–93%), and an AUC of 0.96 (95% CI: 0.94–0.97) in 28 included studies with 28 best value tables.
4.1 Key findings and clinical implications
The high diagnostic accuracy of AI algorithms in this meta-analysis highlights their potential to complement or even surpass conventional imaging methods. The overlapping clinical and imaging features of PC and pancreatitis often lead to diagnostic uncertainty, which can result in delayed treatment for PC or unnecessary surgery for pancreatitis (50, 51). AI’s ability to extract and analyze high-throughput quantitative features from imaging data offers a promising solution to this problem (52). Notably, the pooled SEN and SPE of AI algorithms exceeded the performance thresholds typically observed in conventional imaging, suggesting that AI could serve as a valuable decision-support tool in clinical practice (53).
Subgroup analyses revealed that both ML and DL algorithms demonstrated comparable diagnostic efficacy, with ML achieving an SEN of 89% and an SPE of 88%, and DL achieving an SEN of 90% and an SPE of 88%. This suggests that the choice of algorithmic architecture may depend on specific clinical contexts and available computational resources. When stratified by imaging modality, AI models demonstrated generally high diagnostic performance across all four modalities. CT-based models achieved the highest pooled accuracy (AUC 0.95), indicating excellent discriminative ability. Ultrasound-based AI also showed strong performance (AUC 0.92), followed by MRI-based approaches (AUC 0.89). PET/CT-based AI yielded a somewhat lower accuracy (AUC 0.87) compared with the other modalities, although its performance remained clinically meaningful. These findings indicate that while AI can provide valuable diagnostic support across different imaging modalities, performance levels are not uniform. The inclusion of heterogeneous imaging modalities likely contributed to the overall inconsistency observed in the meta-analysis. Future research with larger, modality-specific datasets will be needed to determine the optimal imaging context for AI application in distinguishing PC from pancreatitis.
When analyzed separately, AI demonstrated generally high diagnostic performance across all four imaging modalities, although important differences were observed. CT-based AI models achieved the highest pooled accuracy with an AUC of 0.96, an SEN of 89% (95% CI: 86%–92%), and an SPE of 90% (95% CI: 86%–93%), and showed relatively higher heterogeneity (I² = 82.85% in SEN and I² = 73.43% in SPE). This strong performance may be attributed to the widespread availability of CT, larger training datasets, and more standardized acquisition protocols in the included studies.
Ultrasound-based AI also performed well, with an AUC of 0.94, an SEN of 90% (95% CI: 88%–92%), and an SPE of 88% (95% CI: 83%–91%), although heterogeneity was moderate (I² = 54.18% in SEN and I² = 69.71% in SPE). This variability likely reflects operator dependence and inconsistent image quality across centers. The relatively strong performance despite these limitations suggests that AI may help mitigate inter-operator variability by standardizing interpretation, though most studies were single-center with limited sample sizes.
MRI-based models reached a pooled AUC of 0.89, an SEN of 88% (95% CI: 83%–92%), and an SPE of 86% (95% CI: 82%–89%), but showed high heterogeneity (I² = 78.68% in SEN and I² = 15.83% in SPE). While MRI offers superior soft-tissue contrast, the diversity of acquisition sequences (T1WI, T2WI, and DWI) and lack of harmonization across institutions likely contributed to the variability. Standardized MRI protocols will be critical for improving reproducibility in future studies.
PET/CT-based AI demonstrated the lowest pooled accuracy with an AUC of 0.87, an SEN of 85% (95% CI: 81%–87%), and an SPE of 86% (95% CI: 74%–93%), accompanied by substantial heterogeneity (I² = 29.43% in SEN and I² = 91.07% in SPE). Small study numbers, differences in tracer selection, and heterogeneous reconstruction protocols may explain these results.
Collectively, these findings indicate that although AI shows diagnostic value across modalities, the magnitude of benefit is not uniform. The overall high heterogeneity (I² > 75%) in our meta-analysis can be explained, at least in part, by modality-specific differences. From a clinical perspective, CT- and ultrasound-based AI appear most promising for near-term application due to higher accuracy and accessibility, whereas MRI- and PET/CT-based approaches will require larger, multicenter datasets and standardized acquisition protocols to reduce heterogeneity and establish reliable clinical utility.
In our subgroup analysis by algorithm type, SVM-based models achieved the highest pooled diagnostic accuracy (AUC 0.97), followed by LASSO and ANN frameworks. Although these differences were not statistically significant, the trend suggests that kernel-based classifiers such as SVM may be particularly effective for radiomics-derived imaging features, whereas ANN performance may be limited by small sample sizes and non-standardized input data.
4.2 Comparison with human clinicians
A notable finding from this meta-analysis is the consistently higher pooled diagnostic performance of AI compared with clinicians in studies that directly evaluated both approaches using the same patient datasets. Across eight studies, AI achieved a pooled sensitivity of 88% and a pooled specificity of 88% (AUC 0.93), whereas clinicians achieved a pooled sensitivity of 77% and a pooled specificity of 80% (AUC 0.85). These results suggest that AI systems may provide comparable or numerically higher diagnostic accuracy than clinicians in differentiating PC from pancreatitis. However, this difference should be interpreted cautiously, as the included studies were heterogeneous in imaging modality, algorithm type, and clinician experience, and lacked formal paired statistical testing. Rather than replacing clinicians, AI may help reduce inter-reader variability and enhance diagnostic consistency, especially in settings with limited subspecialty expertise. However, it is important to note that AI systems are not intended to replace clinicians but rather to augment their diagnostic capabilities, particularly in complex cases where imaging features are ambiguous. However, this advantage was not consistent across all modalities and study settings, and several reports indicated comparable performance between AI and radiologists. These findings suggest that AI may complement clinicians by providing reproducible, quantitative assessments, but it cannot yet be regarded as superior or a replacement. For clinical practice, AI could be particularly useful in cases where imaging features are subtle or inter-observer variability is high, offering an additional layer of decision support. Still, the variability in algorithm design, training datasets, and external validation highlights the need for caution. Future multicenter prospective trials are required to determine whether AI can achieve robust and generalizable improvements in real-world workflows, and to clarify the conditions under which AI provides incremental benefit over expert interpretation (54).
4.3 Impact of study quality on robustness
The sensitivity analysis demonstrated that the pooled diagnostic accuracy of AI was largely unaffected by the exclusion of studies with high risk of bias. The recalculated estimates for sensitivity, specificity, DOR, and AUROC were nearly identical to those of the primary analysis, reinforcing the stability and reliability of the overall findings as shown in Supplementary Table S5. Interestingly, although slight numerical differences were observed—particularly a marginal increase in DOR and AUROC after excluding high-bias studies—these changes were not substantial. This trend may indicate that higher-quality studies tend to yield more consistent and slightly stronger diagnostic performance, possibly due to standardized patient selection and imaging protocols.
Nevertheless, caution is warranted, as such differences could also arise from variations in sample size, imaging modality, or algorithm type rather than methodological quality alone. Overall, the consistency of the results across sensitive analyses supports the robustness of the pooled conclusions and underscores the importance of maintaining methodological rigor in future AI-based diagnostic research.
4.4 Sources of heterogeneity and limitation
Despite the promising results, significant heterogeneity was observed among the included studies (I² > 75%). Subgroup analyses identified several factors contributing to this heterogeneity, including algorithmic architecture, imaging modality, geographic origin, and publication year. For instance, studies conducted in Asia demonstrated slightly higher diagnostic accuracy (AUC = 0.95) compared to those outside Asia (AUC = 0.92). Similarly, single-center studies showed marginally better performance than multi-center studies, possibly due to standardized imaging protocols and reduced variability in data collection (54). These findings suggest that modality-specific factors, such as image resolution, acquisition protocols, and dataset availability, may partly explain the inconsistency observed in the overall pooled results. Nevertheless, the number of studies per modality remains limited, which restricts the ability to draw firm conclusions. Future studies focusing on single-modality AI applications, ideally with larger and more standardized datasets, will be essential to clarify the role of imaging modality in diagnostic performance.
There are several limitations in our meta-analysis. Firstly, almost all of the included studies were retrospective, which introduced potential bias in patient selection. Secondly, we manually calculated the data necessary to construct the confusion matrix in some studies due to the absence of full test in performance metrics. Additionally, some studies were abandoned as we failed to extract or calculate numbers of TN, TP, FP, and FN. We should be cautious in interpreting pooled estimates from quantitative analyses. Thirdly, despite the implementation of subgroup analyses and sensitivity tests, the sources of heterogeneity still waited for further detection. Similar high heterogeneity was also observed in recent systematic reviews evaluating the quality of radiomics models in other fields.
4.5 Future directions and conclusion
In summary, AI algorithms demonstrate robust diagnostic performance in differentiating PC from pancreatitis with CT- and ultrasound-based approaches currently appearing most promising for clinical translation. Importantly, AI should be viewed as an adjunct to, rather than a replacement for, radiologists, providing quantitative, reproducible insights that can support clinical decision-making. Future research should focus on prospective multicenter studies with standardized imaging protocols, external validation, and explainable AI frameworks to facilitate integration into radiology workflows. Combining AI with other data modalities, such as genomic or histopathologic information, may further enhance diagnostic accuracy and pave the way for precision medicine in pancreatic diseases.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.
Author contributions
JL: Writing – original draft, Visualization, Methodology, Investigation, Conceptualization. HZ: Investigation, Methodology, Writing – original draft. ZY: Investigation, Supervision, Writing – original draft. JY: Investigation, Writing – original draft, Supervision. QY: Writing – original draft, Supervision, Investigation. YL: Supervision, Investigation, Writing – review & editing. PJ: Supervision, Writing – review & editing, Investigation. MF: Investigation, Writing – review & editing, Supervision. JZ: Supervision, Writing – review & editing.
Funding
The author(s) declared financial support was received for this work and/or its publication. This research was supported by Health Commission of Sichuan Province Medical Science and Technology Program (24QNMP016 to JZ), Luzhou Science and Technology Program (2024JYJ041 to JZ), Southwest Medical University Technology Program (2024ZKY017), and the Sichuan Science and Technology Program (2025ZNSFSC1770 to JZ), Scientific Research Program of Sichuan Society of Integrative Medicine (ZXY2025022 to JZ). Supported by the China Postdoctoral Science Foundation (2025M783899) and the Sichuan Medical Association Innovation Program (S20250059, Q20250020).
Conflict of interest
The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that Generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2025.1660271/full#supplementary-material
References
1. Usanase N, Ozsahin DU, David LR, Uzun B, Hussain AJ, and Ozsahin I. Deep learning-based CT-scan image classification for accurate detection of pancreatic cancer: A Comparative Study of Different Pre-Trained Models, in: 2024 17th International Conference on Development in eSystem Engineering (DeSE), Piscataway, NJ, USA: IEEE, 2024 6-8 Nov. (2024).
2. Klein AP. Pancreatic cancer epidemiology: understanding the role of lifestyle and inherited risk factors. Nat Rev Gastroenterol Hepatol. (2021) 18:493–502. doi: 10.1038/s41575-021-00457-x
3. Zhang H, Meng Y, Li Q, Yu J, Liu F, Fang X, et al. Two nomograms for differentiating mass-forming chronic pancreatitis from pancreatic ductal adenocarcinoma in patients with chronic pancreatitis. Eur Radiol. (2022) 32:6336–47. doi: 10.1007/s00330-022-08698-3
4. Almisned FA, Usanase N, Ozsahin DU, and Ozsahin I. Incorporation of explainable artificial intelligence in ensemble machine learning-driven pancreatic cancer diagnosis. Sci Rep. (2025) 15:14038. doi: 10.1038/s41598-025-98298-0
5. Stoffel EM, Brand RE, and Goggins M. Pancreatic cancer: changing epidemiology and new approaches to risk assessment, early detection, and prevention. Gastroenterology. (2023) 164:752–65. doi: 10.1053/j.gastro.2023.02.012
6. Cai J, Chen H, Lu M, Zhang Y, Lu B, You L, et al. Advances in the epidemiology of pancreatic cancer: Trends, risk factors, screening, and prognosis. Cancer Lett. (2021) 520:1–11. doi: 10.1016/j.canlet.2021.06.027
7. Schima W, Böhm G, Rösch CS, Klaus A, Függer R, and Kopf H. Mass-forming pancreatitis versus pancreatic ductal adenocarcinoma: CT and MR imaging for differentiation. Cancer Imaging. (2020) 20:52. doi: 10.1186/s40644-020-00324-z
8. Ozsahin DU, Usanase N, and Ozsahin I. Advancing pancreatic cancer management: the role of artificial intelligence in diagnosis and therapy. Beni-Suef Univ J Basic Appl Sci. (2025) 14:32. doi: 10.1186/s43088-025-00610-4
9. Al-Hawary M. Role of imaging in diagnosing and staging pancreatic cancer. J Natl Compr Canc Netw. (2016) 14:678–80. doi: 10.6004/jnccn.2016.0191
10. Rhee H and Park MS. The role of imaging in current treatment strategies for pancreatic adenocarcinoma. Korean J Radiol. (2021) 22:23–40. doi: 10.3348/kjr.2019.0862
11. Kartalis N. CT. and MRI of pancreatic cancer: there is no rose without a thorn! Eur Radiol. (2018) 28:3482–3. doi: 10.1007/s00330-018-5486-z
12. Ormsby EL, Kojouri K, Chang PC, Lin TY, Vuong B, Ramirez RM, et al. Association of standardized radiology reporting and management of abdominal CT and MRI with diagnosis of pancreatic cancer. Clin Gastroenterol Hepatol. (2023) 21:644–52.e2. doi: 10.1016/j.cgh.2022.03.047
13. Ozsahin I, Uzun B, Mustapha MT, Usanese N, Yuvali M, and Ozsahin DU. Chapter 8 - BI-RADS-based classification of breast cancer mammogram dataset using six stand-alone machine learning algorithms. In: Zgallai WA and Ozsahin DU, editors. Artificial Intelligence and Image Processing in Medical Imaging. Cham, Switzerland: Academic Press (2024). p. 195–216.
14. Ozsahin I, Usanase N, Uzun B, Ozsahin DU, and Mustapha MT. Chapter 7 - A mathematical resolution in selecting suitable magnetic field-based breast cancer imaging modality: a comparative study on seven diagnostic techniques. In: Zgallai WA and Ozsahin DU, editors. Artificial Intelligence and Image Processing in Medical Imaging Cham, Switzerland: Springer (2024). p. 173–94.
15. Lee JH, Shin J, Min JH, Jeong WK, Kim H, Choi SY, et al. Preoperative prediction of early recurrence in resectable pancreatic cancer integrating clinical, radiologic, and CT radiomics features. Cancer Imaging. (2024) 24:6. doi: 10.1186/s40644-024-00653-3
16. Yang E, Kim JH, Min JH, Jeong WK, Hwang JA, Lee JH, et al. nnU-net-based pancreas segmentation and volume measurement on CT imaging in patients with pancreatic cancer. Acad Radiol. (2024) 31:2784–94. doi: 10.1016/j.acra.2024.01.004
17. Pereira SP, Oldfield L, Ney A, Hart PA, Keane MG, Pandol SJ, et al. Early detection of pancreatic cancer. Lancet Gastroenterol Hepatol. (2020) 5:698–710. doi: 10.1016/S2468-1253(19)30416-9
18. Yang J, Xu R, Wang C, Qiu J, Ren B, and You L. Early screening and diagnosis strategies of pancreatic cancer: a comprehensive review. Cancer Commun (Lond). (2021) 41:1257–74. doi: 10.1002/cac2.12204
19. Cao K, Xia Y, Yao J, Han X, Lambert L, Zhang T, et al. Large-scale pancreatic cancer detection via non-contrast CT and deep learning. Nat Med. (2023) 29:3033–43. doi: 10.1038/s41591-023-02640-w
20. Bhinder B, Gilvary C, Madhukar NS, and Elemento O. Artificial intelligence in cancer research and precision medicine. Cancer Discov. (2021) 11:900–15. doi: 10.1158/2159-8290.CD-21-0090
21. Chen M, Copley SJ, Viola P, Lu H, and Aboagye EO. Radiomics and artificial intelligence for precision medicine in lung cancer treatment. Semin Cancer Biol. (2023) 93:97–113. doi: 10.1016/j.semcancer.2023.05.004
22. Usanase N, Usman AG, Ozsahin DU, David LR, Ozsahin I, Uzun B, et al. Hybridized paradigms for the clinical prediction of lung cancer, in: 2024 17th International Conference on Development in eSystem Engineering (DeSE), Piscataway, NJ, USA: IEEE, 2024 6-8 Nov. (2024).
23. Fontenele RC and Jacobs R. Unveiling the power of artificial intelligence for image-based diagnosis and treatment in endodontics: An ally or adversary? Int Endod J. (2025) 58:155–70. doi: 10.1111/iej.14163
24. Linning E, Xu Y, Wu Z, Li L, Zhang N, Yang H, et al. Differentiation of focal-type autoimmune pancreatitis from pancreatic ductal adenocarcinoma using radiomics based on multiphasic computed tomography. J Comput Assist Tomogr. (2020) 44:511–8. doi: 10.1097/RCT.0000000000001049
25. Ziegelmayer S, Kaissis G, Harder F, Jungmann F, Müller T, Makowski M, et al. Deep convolutional neural network-assisted feature extraction for diagnostic discrimination and feature visualization in pancreatic ductal adenocarcinoma (PDAC) versus autoimmune pancreatitis (AIP). J Clin Med. (2020) 9(12):4013. doi: 10.3390/jcm9124013
26. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. (2021) 372:n71. doi: 10.1136/bmj.n71
27. Anai K, Hayashida Y, Ueda I, Hozuki E, Yoshimatsu Y, Tsukamoto J, et al. The effect of CT texture-based analysis using machine learning approaches on radiologists’ performance in differentiating focal-type autoimmune pancreatitis and pancreatic duct carcinoma. Jpn J Radiol. (2022) 40:1156–65. doi: 10.1007/s11604-022-01298-7
28. Zhang Y, Cheng C, Liu Z, Wang L, Pan G, Sun G, et al. Radiomics analysis for the differentiation of autoimmune pancreatitis and pancreatic ductal adenocarcinoma in (18) F-FDG PET/CT. Med Phys. (2019) 46:4520–30. doi: 10.1002/mp.13733
29. Tong T, Gu J, Xu D, Song L, Zhao Q, Cheng F, et al. Deep learning radiomics based on contrast-enhanced ultrasound images for assisted diagnosis of pancreatic ductal adenocarcinoma and chronic pancreatitis. BMC Med. (2022) 20:74. doi: 10.1186/s12916-022-02258-8
30. Zhu M, Xu C, Yu J, Wu Y, Li C, Zhang M, et al. Differentiation of pancreatic cancer and chronic pancreatitis using computer-aided diagnosis of endoscopic ultrasound (EUS) images: a diagnostic test. PloS One. (2013) 8:e63820. doi: 10.1371/journal.pone.0063820
31. Săftoiu A, Vilmann P, Dietrich CF, Iglesias-Garcia J, Hocke M, Seicean A, et al. Quantitative contrast-enhanced harmonic EUS in differential diagnosis of focal pancreatic masses (with videos). Gastrointest Endosc. (2015) 82:59–69. doi: 10.1016/j.gie.2014.11.040
32. Marya NB, Powers PD, Chari ST, Gleeson FC, Leggett CL, Abu Dayyeh BK, et al. Utilisation of artificial intelligence for the development of an EUS-convolutional neural network model trained to enhance the diagnosis of autoimmune pancreatitis. Gut. (2021) 70:1335–44. doi: 10.1136/gutjnl-2020-322821
33. Wei W, Jia G, Wu Z, Wang T, Wang H, Wei K, et al. A multidomain fusion model of radiomics and deep learning to discriminate between PDAC and AIP based on (18)F-FDG PET/CT images. Jpn J Radiol. (2023) 41:417–27. doi: 10.1007/s11604-022-01363-1
34. Ren S, Zhang J, Chen J, Cui W, Zhao R, Qiu W, et al. Evaluation of texture analysis for the differential diagnosis of mass-forming pancreatitis from pancreatic ductal adenocarcinoma on contrast-enhanced CT images. Front Oncol. (2019) 9:1171. doi: 10.3389/fonc.2019.01171
35. Zhang Y, Cheng C, Liu Z, Pan G, Sun G, Yang X, et al. Differentiation of autoimmune pancreatitis and pancreatic ductal adenocarcinoma based on multi-modality texture features in <18<F-FDG PET/CT. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. (2019) 36:755–62. doi: 10.7507/1001-5515.201807012
36. Lu J, Jiang N, Zhang Y, and Li D. A CT based radiomics nomogram for differentiation between focal-type autoimmune pancreatitis and pancreatic ductal adenocarcinoma. Front Oncol. (2023) 13:979437. doi: 10.3389/fonc.2023.979437
37. Ren S, Zhao R, Zhang J, Guo K, Gu X, Duan S, et al. Diagnostic accuracy of unenhanced CT texture analysis to differentiate mass-forming pancreatitis from pancreatic ductal adenocarcinoma. Abdom Radiol (NY). (2020) 45:1524–33. doi: 10.1007/s00261-020-02506-6
38. Săftoiu A, Vilmann P, Gorunescu F, Gheonea DI, Gorunescu M, Ciurea T, et al. Neural network analysis of dynamic sequences of EUS elastography used for the differential diagnosis of chronic pancreatitis and pancreatic cancer. Gastrointest Endosc. (2008) 68:1086–94. doi: 10.1016/j.gie.2008.04.031
39. Malagi AV, Shivaji S, Kandasamy D, Sharma R, Garg P, Gupta SD, et al. Pancreatic mass characterization using IVIM-DKI MRI and machine learning-based multi-parametric texture analysis. Bioeng (Basel). (2023) 10(1):83. doi: 10.3390/bioengineering10010083
40. Săftoiu A, Vilmann P, Gorunescu F, Janssen J, Hocke M, Larsen M, et al. Efficacy of an artificial neural network-based approach to endoscopic ultrasound elastography in diagnosis of focal pancreatic masses. Clin Gastroenterol Hepatol. (2012) 10:84–90.e1. doi: 10.1016/j.cgh.2011.09.014
41. Norton ID, Zheng Y, Wiersema MS, Greenleaf J, Clain JE, and Dimagno EP. Neural network analysis of EUS images to differentiate between pancreatic Malignancy and pancreatitis. Gastrointest Endosc. (2001) 54:625–9. doi: 10.1067/mge.2001.118644
42. Liu J, Hu L, Zhou B, Wu C, and Cheng Y. Development and validation of a novel model incorporating MRI-based radiomics signature with clinical biomarkers for distinguishing pancreatic carcinoma from mass-forming chronic pancreatitis. Transl Oncol. (2022) 18:101357. doi: 10.1016/j.tranon.2022.101357
43. Li J, Liu F, Fang X, Cao K, Meng Y, Zhang H, et al. CT radiomics features in differentiation of focal-type autoimmune pancreatitis from pancreatic ductal adenocarcinoma: A propensity score analysis. Acad Radiol. (2022) 29:358–66. doi: 10.1016/j.acra.2021.04.014
44. Deng Y, Ming B, Zhou T, Wu JL, Chen Y, Liu P, et al. Radiomics model based on MR images to discriminate pancreatic ductal adenocarcinoma and mass-forming chronic pancreatitis lesions. Front Oncol. (2021) 11:620981. doi: 10.3389/fonc.2021.620981
45. Qu W, Zhou Z, Yuan G, Li S, Li J, Chu Q, et al. Is the radiomics-clinical combined model helpful in distinguishing between pancreatic cancer and mass-forming pancreatitis? Eur J Radiol. (2023) 164:110857. doi: 10.1016/j.ejrad.2023.110857
46. Park S, Chu LC, Hruban RH, Vogelstein B, Kinzler KW, Yuille AL, et al. Differentiating autoimmune pancreatitis from pancreatic ductal adenocarcinoma with CT radiomics features. Diagn Interv Imaging. (2020) 101:555–64. doi: 10.1016/j.diii.2020.03.002
47. Ma X, Wang YR, Zhuo LY, Yin XP, Ren JL, Li CY, et al. Retrospective analysis of the value of enhanced CT radiomics analysis in the differential diagnosis between pancreatic cancer and chronic pancreatitis. Int J Gen Med. (2022) 15:233–41. doi: 10.2147/IJGM.S337455
48. Nakamura H, Fukuda M, Matsuda A, Makino N, Kimura H, Ohtaki Y, et al. Differentiating localized autoimmune pancreatitis and pancreatic ductal adenocarcinoma using endoscopic ultrasound images with deep learning. DEN Open. (2024) 4:e344. doi: 10.1002/deo2.344
49. Zhang L, Chen X, Chen Z, Chen W, Zheng J, Zhuo M, et al. Differentiating pancreatic ductal adenocarcinoma and autoimmune pancreatitis using a machine learning model based on ultrasound clinical features. Front Oncol. (2025) 15:1505376. doi: 10.3389/fonc.2025.1505376
50. Wang ZH, Zhu L, Xue HD, and Jin ZY. Quantitative MR imaging biomarkers for distinguishing inflammatory pancreatic mass and pancreatic cancer-a systematic review and meta-analysis. Eur Radiol. (2024) 34:6738–50. doi: 10.1007/s00330-024-10720-9
51. Ohno E, Kawashima H, Ishikawa T, Iida T, Suzuki H, Uetsuki K, et al. Diagnostic performance of endoscopic ultrasonography-guided elastography for solid pancreatic lesions: Shear-wave measurements versus strain elastography with histogram analysis. Dig Endosc. (2021) 33:629–38. doi: 10.1111/den.13791
52. Huang B, Huang H, Zhang S, Zhang D, Shi Q, Liu J, et al. Artificial intelligence in pancreatic cancer. Theranostics. (2022) 12:6931–54. doi: 10.7150/thno.77949
53. Jan Z, El Assadi F, Abd-Alrazaq A, and Jithesh PV. Artificial intelligence for the prediction and early diagnosis of pancreatic cancer: scoping review. J Med Internet Res. (2023) 25:e44248. doi: 10.2196/44248
Keywords: artificial intelligence, preoperative diagnosis, pancreatic cancer, pancreatitis, differential diagnosis, meta-analysis
Citation: Lu J, Zhang H, Yuan Z, Yue J, Yao Q, Liu Y, Jie P, Fan M and Zhao J (2026) Image-based artificial intelligence for preoperative differentiation of pancreatic cancer from pancreatitis: a systematic review and meta-analysis. Front. Oncol. 15:1660271. doi: 10.3389/fonc.2025.1660271
Received: 05 July 2025; Accepted: 03 December 2025; Revised: 25 November 2025;
Published: 12 January 2026.
Edited by:
Almir Galvão Vieira Bitencourt, A. C. Camargo Cancer Center, BrazilReviewed by:
Seyed Amir Ahmad Safavi-Naini, Shahid Beheshti University of Medical Sciences, IranNatacha Usanase, Near East University, Cyprus
Copyright © 2026 Lu, Zhang, Yuan, Yue, Yao, Liu, Jie, Fan and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Pingping Jie, ajM1ODAzODM2M0AxNjMuY29t; Min Fan, ZmFubWlubWluMTUxODE5N0AxNjMuY29t; Jie Zhao, emhhb2o1MjJAc3dtdS5lZHUuY24=
†These authors have contributed equally to this work
Jiajun Yue1