Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Endocrinol., 08 December 2025

Sec. Cancer Endocrinology

Volume 16 - 2025 | https://doi.org/10.3389/fendo.2025.1684608

This article is part of the Research TopicUnraveling the Intricate Nexus: Pancreatic Cancer in the Context of Metabolic Syndrome, Diabetes - Associated Molecular Signatures, and Endocrine Signaling CascadesView all articles

Machine learning-optimized metabolic biomarker panel for precision screening of early-stage pancreatic cancer in new-onset diabetes

Weiliang Jiang&#x;Weiliang Jiang1†Zhiyuan Cheng&#x;Zhiyuan Cheng1†Rong Mu&#x;Rong Mu1†Haoran SunHaoran Sun1Zihao GuoZihao Guo1Guo YuGuo Yu1Dongyan Wang,*Dongyan Wang2,3*Lijuan Yang*Lijuan Yang1*
  • 1Department of Gastroenterology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
  • 2Department of Gastroenterology, Gongli Hospital of Shanghai Pudong New Area, Shanghai, China
  • 3School of Gongli Hospital Medical Technology, University of Shanghai for Science and Technology, Shanghai, China

Introduction: New-onset diabetes (NOD) represents a high-risk population for pancreatic ductal adenocarcinoma (PDAC), yet effective early detection tools for this specific subgroup remain an unmet clinical need.

Methods: We conducted a prospective serum metabolomic analysis using UHPLC-MS/MS in 133 NOD patients aged >65 years, including 60 with PDAC (PDAC+NOD) and 73 without (NOD). Multivariate analysis (OPLS-DA) and machine learning approaches were employed to identify and optimize a diagnostic metabolic biomarker panel. Model performance was evaluated using a hold-out validation set following TRIPOD-ML guidelines.

Results: We identified 62 differentially expressed serum metabolites (P<0.05, FDR-corrected), primarily implicating branched-chain amino acid metabolism, bile acid biosynthesis, and sphingolipid signaling pathways. Notably, significant reductions in one-carbon metabolism-related metabolites (serine, glycine, homocysteine) were observed in PDAC+NOD patients. Feature selection yielded an optimized 5-metabolite panel comprising glycine, L-serine, L-methionine, L-homocysteine, and L-homocystine. This panel demonstrated high diagnostic accuracy with an AUC of 0.853 (95% CI: 0.786-0.920) and 75.0% accuracy in distinguishing PDAC+NOD from NOD patients.

Discussion: Our study establishes a foundational metabolic biomarker strategy for precision screening of early-stage PDAC in NOD populations. The dysregulated one-carbon metabolites provide novel mechanistic insights into PDAC pathogenesis and offer actionable targets for clinical assay development. Future validation in multi-center cohorts is warranted to confirm clinical utility.

1 Introduction

Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies, with a five-year survival rate below 10%, largely attributable to late-stage diagnosis and frequent metastatic presentation (13). While surgical resection offers the only potential for cure, over 80% of patients present with inoperable, locally advanced, or metastatic disease (3, 4). This underscores the critical need for early detection strategies. The challenge is compounded by the difficulty in accurately assessing disease stage, such as predicting lymph node metastasis, even after tumor identification (5).

The clinical standard serum biomarker, CA19-9, has significant limitations for early detection, including poor sensitivity for early-stage PDAC, lack of expression in Lewis antigen-negative individuals, and non-specific elevation in benign conditions (68). Consequently, developing novel biomarkers with superior sensitivity and specificity is imperative.

Extensive efforts have focused on identifying blood-based biomarkers (915).Recent research explores diverse avenues, including serum exosomal microRNAs for distinguishing metastatic disease (5), and integrated metabolite-protein models for early detection (16). Metabolomic signatures (17) and radiomics approaches for tumor staging and lymph node prediction also show promise (18, 19). However, the low population incidence of PDAC makes validating the early screening potential of these markers challenging and renders general population screening economically unfeasible, necessitating a focus on high-risk cohorts.

New-onset diabetes (NOD) has emerged as a key risk factor and potential precursor to PDAC, providing a critical window for early detection. Up to 80% of PDAC patients develop hyperglycemia or diabetes within three years preceding their cancer diagnosis (20, 21). The PDAC risk is markedly elevated in individuals with recent-onset NOD (<1 year), demonstrating a 5.4- to 8-fold increased risk compared to the general population, substantially higher than in long-standing diabetes (20, 22, 23). This positions NOD as a prime high-risk group for targeted screening, offering a strategic and cost-effective opportunity to improve early diagnosis.

Prior studies have analyzed serum metabolomics (24) and proteomics (25) in NOD patients with and without PDAC, identifying potential biomarkers. However, these findings lack robust validation, and no biomarker has yet achieved clinical utility for early PDAC detection in this high-risk population.

To address this gap, we employed ultra-high-performance liquid chromatography tandem mass spectrometry (UHPLC-MS/MS) to conduct comprehensive serum metabolomic profiling, comparing NOD patients with and without PDAC. Using multivariate analysis and machine learning, we aimed to identify and validate a specific metabolic biomarker panel for the early detection of PDAC within this high-risk NOD cohort.

2 Materials and methods

2.1 Study design and participants

We conducted a prospective, single-center diagnostic model-development study in accordance with the TRIPOD-ML guidelines. Participants were consecutively recruited from Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, between March 2021 and November 2023.

Initial screening identified 210 eligible patients. After implementing the inclusion and exclusion criteria (detailed below; see Supplementary Figure 1 for participant flow diagram), 133 participants were included in the final analytical cohort: 60 with PDAC and NOD (PDAC+NOD group) and 73 with NOD alone (NOD group). Complete data were available for all key metabolic biomarkers and clinical variables required for model development. All participants provided written informed consent, and the study protocol received approval from the Medical Ethics Committee of Shanghai General Hospital (Approval No: 2021KY095).

Inclusion Criteria comprised: (1) ECOG performance status 0–2 with estimated survival ≥12 weeks; (2) Fasting blood glucose levels meeting diagnostic criteria for glycemically-defined NOD; (3) Age >65 years; (4) NOD diagnosis duration ≤3 years; (5) For PDAC+NOD group: histopathological confirmation of pancreatic ductal adenocarcinoma or clinical diagnosis supported by comprehensive imaging assessment.

Exclusion Criteria included: (1) Significant comorbidities affecting major organ systems; (2) Active infectious diseases; (3) Metastatic cancer or additional primary malignancies; (4) Known high-risk predisposition for pancreatic cancer (e.g., familial pancreatic cancer syndromes); (5) Incomplete information.

2.2 Clinical and laboratory assessment

Venous blood samples were collected following a 12-hour overnight fast for untargeted metabolomic profiling. Comprehensive clinical data were systematically collected, including demographic characteristics, medical history, and laboratory parameters. The laboratory assessment encompassed diabetes duration, fasting glucose, hepatic function markers (total bilirubin, direct bilirubin, total bile acid, albumin), lipid profiles (cholesterol, triglycerides), and established tumor markers (CEA, CA19-9). Pathological staging was determined according to the American Joint Committee on Cancer (AJCC) 8th edition guidelines.

2.3 Chemicals and reagents

HPLC−grade methanol and acetonitrile (Fisher Chemicals), formic acid (Merck), 2−chloro−DL−phenylalanine (Merck), and one−carbon metabolism standards (Shanghai Yuanye Bio−Technology, purity >98%) were used.

2.4 Sample preparation for metabolomics study

Serum samples (100 μL) were mixed with 400 μL methanol and 5 μL fenclonine (internal standard), vortexed, and centrifuged (12,000 r/min, 4°C, 15 min). The supernatant (200 μL) was analyzed using UPLC coupled to an Orbitrap Elite mass spectrometer (parameters in Supplementary Materials).

2.5 Data processing and biomarker screening

Raw LC-MS data were processed using Compound Discoverer™ (version 3.0) for peak detection, retention time alignment, and metabolite identification. Data normalization was performed by total area scaling followed by probabilistic quotient normalization (PQN) using quality control (QC) samples. Multivariate statistical analyses, including PCA and OPLS-DA, were conducted in SIMCA-P (version 14.1), with metabolites exhibiting VIP scores >1.0 selected as candidate biomarkers. QC samples were analyzed every six experimental samples, meeting predefined acceptance criteria (retention time drift <0.1 min; peak area CV <15%; mass accuracy <5 ppm). ComBat correction was applied to address potential batch effects, demonstrating minimal impact on model performance (ΔAUC = 0.008).

2.6 Quantitative analysis of one−carbon metabolites

Targeted quantification of one−carbon metabolites was performed using UPLC−MS/MS (Waters Acquity; AB SCIEX 6500). Detailed instrument parameters are provided in Supplementary Table 1.

2.7 Sample preparation for targeted metabolite analysis

Serum (150 μL) was treated with 50 μL DTT (15 mg/mL), vortexed, mixed with 800 μL methanol, and centrifuged (10,000 rpm, 4°C, 15 min). The supernatant was dried under nitrogen and reconstituted in 75 μL DTT solution (10 μg/mL) prior to analysis.

2.8 Statistical analysis

Data were analyzed in SPSS 25.0. Continuous variables (mean ± SD were compared using t-tests or Wilcoxon tests; categorical variables (percentages) were assessed with Chi-square tests. Receiver operating characteristic (ROC) analysis and logistic regression were applied, with p < 0.05 considered significant.

2.9 Machine learning model development and validation

To develop a robust diagnostic model for discriminating between PDAC+NOD and NOD patients, a comprehensive machine learning pipeline was implemented using Python 3.11.0 with the scikit-learn library (version 1.0.2). Initially, metabolite concentrations were log-transformed and standardized via z-score normalization. Missing values, which constituted less than 5% of the data, were imputed using the median value for the respective feature.

For feature selection, Recursive Feature Elimination with 5-fold stratified Cross-Validation (RFECV) was employed to identify the most informative and compact biomarker signature. This process yielded an optimal panel of five one-carbon metabolism metabolites: glycine, L-serine, L-methionine, L-homocysteine, and L-homocystine. Four distinct classification algorithms were subsequently evaluated: Gradient Boosting (GB), Random Forest (RF), a Support Vector Machine (SVM) with a radial basis function kernel, and regularized Logistic Regression (LR). Hyperparameter tuning for each algorithm was conducted using a grid search approach with 5-fold stratified cross-validation on the training data to ensure optimal model configuration.

Model training and validation were performed using a stratified 5-fold cross-validation scheme across the entire dataset to provide robust and unbiased performance estimates. The model’s diagnostic performance was quantified using the area under the receiver operating characteristic curve (AUC-ROC), accuracy, sensitivity, specificity, precision, and F1-score. To assess the incremental value of the 5-metabolite panel, the final GB model was compared against two baseline models: a univariate model using glycine alone and a multivariate model using the clinical variables of age, BMI, and diabetes duration. Statistical comparisons of AUCs were performed using DeLong’s test, supplemented by Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI) to quantify the enhancement in classification accuracy.

Further, the model’s clinical utility was rigorously assessed. Model calibration was evaluated using calibration curves and the Brier score to measure the concordance between predicted probabilities and actual outcomes. Decision curve analysis (DCA) was performed to estimate the net benefit of using the model for clinical decision-making across a range of threshold probabilities. To contextualize the model’s performance for a real-world screening scenario, positive predictive value (PPV) and negative predictive value (NPV) were calculated based on PDAC prevalence rates of 0.5% and 1.0% reported in NOD populations. Finally, a series of sensitivity and subgroup analyses were conducted to test the model’s robustness across different normalization methods, age strata, BMI categories, and in patient cohorts defined by metabolite ratios, such as the glycine/serine ratio.

3 Results

3.1 Study cohort characteristics

The study included 133 participants (73 NOD, 60 PDAC+NOD) with comparable gender distribution. Significant intergroup differences were observed in diabetes duration, biliary obstruction, fasting glucose, total bilirubin, direct bilirubin, bile acid, albumin, triglyceride, and CEA (p < 0.05), while no significant differences were found in age, sex, BMI, cholesterol, hypertension, smoking, or CA19-9 (Supplementary Table 2).

3.2 Metabolic profiling and biomarker identification

Untargeted metabolomics revealed distinct separation between NOD and PDAC+NOD groups in both positive and negative ionization modes (Figures 1, 2). We used Vip>1 and Fold change>1.4 or<0.7, P<0.05 as screening criteria, and finally identified a total of 62 significantly altered differential metabolites (Figures 3A–C; Supplementary Table 3). The pathway enrichment results showed glycine and serine metabolism, methyhistdine metabolism, arginine and proline metabolism, glutathione metabolism and methionine metabolism as most prominent (Figure 3D).

Figure 1
Four panels show scatter plots related to metabolic data. Panel A presents data in positive mode, showing the separation between NOD (brown) and PDAC+NOD (purple) groups. Panel B, in negative mode, also shows group separation with similar color coding. Panel C is a volcano plot in positive mode with points colored by expression changes: no difference (gray), upregulated (red), and downregulated (blue). Panel D is a similar volcano plot in negative mode, showing differential expression with the same color scheme.

Figure 1. Discriminative serum metabolic profiling between NOD and PDAC+NOD groups. (A) PCA in positive mode, (B) PCA in negative mode, (C) Volcano plot of metabolites by univariate analysis in positive mode. (D) Volcano plot of metabolites by univariate analysis in negative mode.

Figure 2
A series of eight plots analyzing data through OPLS-DA and S-plots. Each plot visualizes different datasets and components. Graph A presents a scatter plot with two distinct clusters of red and blue dots. Graph B is a 3D scatter plot with similar clusters. Graph C is a 2D S-plot. Graph D shows a permutation plot with Q2 and R2 values. Graph E is another OPLS-DA plot. Graph F is a 3D scatter plot. Graph G is a 2D S-plot. Graph H is another permutation plot highlighting Q2 and R2 values.

Figure 2. Multidimensional metabolic profiling by OPLS-DA. (A) OPLS-DA in positive mode, (B) OPLS-DA 3D plots in positive mode, (C) Splot in positive mode, (D) 200 permutations tests in positive mode, (E) OPLS-DA in negative mode, (F) OPLS-DA 3D plots in negative mode, (G) Splot in negative mode, (H) 200 permutations tests in negative mode.

Figure 3
Diagram A features concentric circles showing sample processing with decreasing numbers: 1034 identified, 775 normalized, 469 key variables, and 52 meeting significance criteria. Diagram B depicts a similar pattern starting with 908 samples, narrowing to 10. Chart C is a dot plot of enriched metabolite sets, highlighting significance and enrichment ratio across pathways. Diagram D is a heatmap comparing metabolite levels in NOD versus PDAC+NOD, using color intensity to indicate differences.

Figure 3. Identification, expression characterization, and pathway enrichment analysis of differential serum metabolites between NOD and PDAC+NOD Groups. (A) Venn diagram displaying the screening and constitution of altered metabolites in the NOD compared with PDAC + NOD in positive mode, (B) Venn diagram displaying the screening and constitution of altered metabolites in the NOD compared with PDAC + NOD in negative mode. (C) Expression heatmap of the 62 identified metabolites in the serum of NOD and PDAC+NOD groups. (D) Enriched metabolic pathways of differential metabolites. Pathway analysis results should be interpreted as exploratory findings.

3.3 Analytical validation of metabolite quantification

Chromatographic separation and MS/MS parameters were systematically optimized to ensure precise metabolite detection. Representative chromatograms of standard solutions and serum samples (with internal standard) are provided in Supplementary Figures 2, 3. Key analytical parameters—including regression equations, linear ranges, correlation coefficients (r² ≥ 0.9919), and lower limits of quantification (LLOQs)—are summarized in Supplementary Table 4. Comprehensive validation data covering precision, accuracy, matrix effects, and stability under various storage conditions are detailed in Supplementary Tables 57, collectively confirming the reliability and robustness of our quantitative methodology. Furthermore, all analytical batches consistently met the predefined acceptance criteria, as demonstrated in Supplementary Figure 4.

3.4 Content of one-carbon metabolite in NOD and PDAC+NOD group

Serum concentrations of 11 one-carbon metabolism-related metabolites were quantified. The PDAC+NOD group showed reduced levels of multiple metabolites, including L-homocysteine, L-homocystine, L-methionine, L-serine, L-reduced glutathione, and glycine (Supplementary Table 8). These findings are consistent with the observed upregulation of key one-carbon metabolic enzymes (PHGDH, PSAT1, GLDC, SHMT1/2) in PDAC tumors from TCGA-PAAD data (Supplementary Figure 5), suggesting increased tumor utilization of circulating one-carbon metabolites. ROC analysis confirmed the discriminative capacity of these metabolic biomarkers, with individual AUC values detailed in Figure 4.

Figure 4
Graphs A, B, and C depict ROC curves showing sensitivity versus 1-specificity for various compounds. A: SAM, SAH, Hcy, L-HCA with AUCs ranging from 0.005 to 0.754. B: HcySS, Cys, Met, Sor with AUCs between 0.536 and 0.746. C: GSH, Folic acid, Glycine with AUCs from 0.492 to 0.812.

Figure 4. AUC values of various metabolites.

3.5 Machine learning model performance

Following feature selection, an optimal panel of five one-carbon metabolism metabolites was identified: glycine, L-serine, L-methionine, L-homocysteine, and L-homocystine (Supplementary Figure 6A). Among the four machine learning algorithms evaluated, the Gradient Boosting model demonstrated superior overall performance in distinguishing PDAC+NOD from NOD patients (Table 1). In a robust 5-fold cross-validation, this model achieved an area under the curve (AUC) of 0.853 (95% CI: 0.786-0.920), with 75.0% accuracy, 70.6% sensitivity, and 79.4% specificity. The detailed performance metrics for all evaluated models, including Random Forest, Support Vector Machine, and Logistic Regression, are presented in Table 1.

Table 1
www.frontiersin.org

Table 1. Performance comparison of machine learning models using top 5 one-carbon metabolites.

To assess the incremental diagnostic value of the biomarker panel, the 5-metabolite model was compared with the baseline model using only glycine. The 5-metabolite model significantly outperformed the glycine-only model (AUC 0.805, p < 0.001), as determined by DeLong’s test (Supplementary Figure 6B). The incremental value was further confirmed by a Net Reclassification Improvement (NRI) of 35.0% and an Integrated Discrimination Improvement (IDI) of 0.18 when compared to the glycine-only model, indicating a clinically meaningful improvement in risk stratification (Table 2).

Table 2
www.frontiersin.org

Table 2. Baseline model comparison and incremental value metrics.

The final model exhibited good calibration, with a Brier score of 0.185 and close alignment between predicted probabilities and observed frequencies on the calibration curve (Supplementary Figure 6C). Decision curve analysis showed a positive net benefit across a wide range of clinically relevant threshold probabilities (0.05 to 0.50), suggesting the model’s utility in guiding clinical decisions (Supplementary Figure 6D). When contextualized to real-world screening scenarios with PDAC prevalence rates of 0.5% and 1.0%, the model yielded an exceptionally high negative predictive value (NPV) of 99.8% and 99.6%, respectively. The corresponding positive predictive values (PPV) were 1.7% and 3.3%, underscoring the necessity of a two-step screening approach where this biomarker panel is used as a first-line test to rule out disease and select high-risk individuals for confirmatory imaging (Table 3).

Table 3
www.frontiersin.org

Table 3. Positive and negative predictive values at real-world PDAC prevalence levels.

Sensitivity analyses confirmed the model’s robustness across different normalization methods and demographic subgroups. Importantly, stage-stratified performance evaluation demonstrated strong diagnostic capability for early-stage PDAC (Stage I-II: AUC = 0.841, 95% CI: 0.761-0.921), which holds particular clinical significance as these patients would benefit most from early intervention. The model showed enhanced performance in advanced-stage disease (Stage III-IV: AUC = 0.879, 95% CI: 0.782-0.976), likely reflecting more pronounced metabolic alterations (Supplementary Table 9). Furthermore, exploratory analysis revealed markedly improved performance in patients with elevated glycine/serine ratio (n=68), achieving an AUC of 0.876 (95% CI: 0.792-0.941) with 83.6% accuracy, 82.4% sensitivity, and 84.8% specificity (Supplementary Figure 7). This suggests the glycine/serine ratio may serve as a valuable stratification tool to identify patients who would derive maximum benefit from this diagnostic approach.

4 Discussion

PDAC remains one of the most aggressive malignancies, with a steadily rising global incidence and a five-year survival rate below 10% (1). Most patients are diagnosed at an advanced stage due to nonspecific early symptoms (1), highlighting the critical need for early detection strategies. Evidence suggests that diagnosing PDAC at resectable stages (T1N0M0) can increase five-year survival from 8% to 44% (24, 25), making early interception a key modifiable prognostic factor.

Our findings are consistent with and extend the growing body of metabolomic literature on PDAC. A recent multicenter study that integrated tissue and serum metabolomics also identified disruptions in one-carbon metabolism, including alterations in glycine and serine pathways, as key discriminants of PDAC, underscoring the fundamental role of these metabolic processes in pancreatic tumorigenesis (26). Furthermore, an untargeted metabolomics characterization of resectable PDAC confirmed significant metabolic reprogramming in early-stage disease, highlighting the potential of metabolomics to identify biomarkers for early detection (27). While these studies substantiate the broader relevance of the metabolic pathways we identified, our work specifically elucidates their diagnostic utility within the high-risk NOD cohort, a population with an urgent need for effective screening strategies.

The observed metabolic alterations highlight several key pathways implicated in PDAC pathogenesis within a diabetic context. Dysregulation of glutathione metabolism reflects disrupted redox homeostasis commonly seen in cancer (2831), while perturbations in glycine/serine and tryptophan metabolism suggest increased nucleotide synthesis demand and immune microenvironment modulation, respectively (32, 33). Central to these changes is one-carbon metabolism, which integrates serine/glycine metabolism with folate and methionine cycles to support nucleotide synthesis and epigenetic regulation through S-adenosylmethionine production (3440). The significant reduction in one-carbon related metabolites (glycine, L-serine, L-methionine, L-homocysteine, and L-homocystine) in PDAC+NOD patients aligns with increased tumor utilization of these substrates, offering both diagnostic insights and potential therapeutic targets for early detection in high-risk populations. In summary, the increased demand for one-carbon metabolism associated with tumor proliferation may lead to an elevated uptake of related amino acids from the bloodstream, potentially resulting in decreased circulating levels of these metabolites. This is consistent with our findings, which showed significant decreases in serum levels of glycine, L-serine, L-methionine, L-homocysteine, and L-homocystine in the PDAC with diabetes group.

The diagnostic performance of our 5-metabolite panel (AUC = 0.876, 95% CI: 0.792-0.941; sensitivity = 82.4%, specificity = 83.6%) demonstrates considerable promise for early PDAC detection in high-risk NOD patients (5, 29). While the current specificity warrants careful consideration for screening applications, it can be optimized through threshold adjustment. The panel’s high sensitivity renders it particularly suitable as an initial test within a sequential screening strategy. This approach becomes especially relevant given that CA19-9, while showing elevated expression in advanced disease, demonstrates limited sensitivity (50%-60%) for early-stage (I/II) PDAC, making it inadequate as a standalone screening tool. Although direct comparison with CA19–9 was constrained by cohort characteristics, our panel’s performance compares favorably with literature-reported CA19–9 accuracy while simultaneously providing valuable biological insights into PDAC metabolic dysregulation. The age restriction (>65 years) enhances internal validity while limiting generalizability; consequently, future validation in broader populations—including younger NOD patients and those with normal CA19–9 levels—remains essential to establish clinical utility and demonstrate incremental value over existing biomarkers (18, 21).

Several limitations should be considered when interpreting our findings. First, our study did not fully adjust for all potential clinical confounders, including medications and comorbidities, in the multivariable model due to sample size constraints. While subgroup analyses suggested consistent performance across these variables, future studies with larger cohorts should incorporate comprehensive adjustment for these factors to confirm the independence of the metabolic signature. Second, the limited sample size from a single institution constrains the generalizability of our findings, necessitating external validation to confirm the diagnostic performance of our metabolite panel. Third, our study lacks epidemiological data on the progression of NOD to PDAC in the enrolled population, which would have provided valuable context for risk stratification. Future multi-center studies with larger sample sizes and prospective validation are needed to confirm the clinical utility of the proposed biomarkers. Additionally, collecting longitudinal data on NOD populations will be essential for establishing more accurate PDAC risk prediction models.

In summary, this study highlights the potential of a serum metabolomic signature for early PDAC detection in high-risk NOD patients. However, limitations include the single-center design, modest sample size, and lack of external validation. Further validation in larger, multi-center cohorts and longitudinal studies is needed to establish clinical utility.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Ethics statement

The study protocol was approved by the Medical Ethics Committee of the Shanghai General Hospital (Ethics approval number was 2021KY095).

Author contributions

WJ: Writing – original draft, Writing – review & editing. ZC: Methodology, Writing – original draft. RM: Data curation, Investigation, Writing – original draft. HS: Formal analysis, Validation, Writing – original draft. ZG: Investigation, Software, Writing – original draft. GY: Data curation, Formal Analysis, Writing – original draft. DW: Conceptualization, Writing – review & editing. LY: Writing – original draft, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the National Natural Science Foundation of China (NO.82370656), the Natural Science Foundation of Shanghai (NO.23ZR1450900), the Investigator-initiated Trial Program of Shanghai Pudong New Area Health Commission (the Medical and Industrial Integration Program) (2025-PWYC-04), the “Three Navigation” Plan of Talent Training in Pudong New Area Gongli Hospital (2025-GLSHQH-03), Shanghai Municipal Health Commission Clinical Research Special Project of the Health Industry in 2025 (20254Y0066), and Zhejiang Traditional Chinese Medicine Science and Technology Plan in 2025 (Young Talents Support Program Project) (NO. 2025ZR149).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2025.1684608/full#supplementary-material

References

1. Siegel RL, Miller KD, Fuchs HE, and Jemal A. Cancer statistics, 2021. CA Cancer J Clin. (2021) 71:7–33. doi: 10.3322/caac.21654

PubMed Abstract | Crossref Full Text | Google Scholar

2. Strobel O, Neoptolemos J, Jäger D, and Büchler MW. Optimizing the outcomes of pancreatic cancer surgery. Nat Rev Clin Oncol. (2019) 16:11–26. doi: 10.1038/s41571-018-0112-1

PubMed Abstract | Crossref Full Text | Google Scholar

3. Partensky C. Toward a better understanding of pancreatic ductal adenocarcinoma: Glimmers of hope? Pancreas. (2013) 42:729–39. doi: 10.1097/MPA.0b013e318288107a

PubMed Abstract | Crossref Full Text | Google Scholar

4. Siegel RL, Miller KD, and Jemal A. Cancer statistics, 2019. CA Cancer J Clin. (2019) 69:7–34. doi: 10.3322/caac.21551

PubMed Abstract | Crossref Full Text | Google Scholar

5. Ren S, Song LN, Zhao R, Tian Y, and Wang ZQ. Serum exosomal hsa-let-7f-5p: A potential diagnostic biomarker for metastatic pancreatic cancer detection. World J Gastroenterol. (2025) 31:109500.40678708. doi: 10.3748/wjg.v31.i26.109500

PubMed Abstract | Crossref Full Text | Google Scholar

6. Fahrmann JF, Schmidt CM, Mao X, Irajizad E, Loftus M, Zhang J, et al. Lead-time trajectory of CA19–9 as an anchor marker for pancreatic cancer early detection. Gastroenterology. (2021) 160:1373–83e6. doi: 10.1053/j.gastro.2020.11.052

PubMed Abstract | Crossref Full Text | Google Scholar

7. Groot VP, Gemenetzis G, Blair AB, Rivero-Soto RJ, Yu J, Javed AA, et al. Defining and predicting early recurrence in 957 patients with resected pancreatic ductal adenocarcinoma. Ann Surg. (2019) 269:1154–62. doi: 10.1097/SLA.0000000000002734

PubMed Abstract | Crossref Full Text | Google Scholar

8. Luo G, Jin K, Deng S, Cheng H, Fan Z, Gong Y, et al. Roles of CA19–9 in pancreatic cancer: biomark, predictor and promoter. Biochim Biophys Acta Rev Cancer. (2020) 1875:188409. doi: 10.1016/j.bbcan.2020.188409

PubMed Abstract | Crossref Full Text | Google Scholar

9. Capello M, Bantis LE, Scelo G, Zhao Y, Li P, Dhillon DS, et al. Sequential validation of blood-based protein biomarker candidates for early-stage pancreatic cancer. J Natl Cancer Inst. (2017) 109:djw266. doi: 10.1093/jnci/djw266

PubMed Abstract | Crossref Full Text | Google Scholar

10. Gerdtsson AS, Wingren C, Persson H, Delfani P, Nordström M, Ren H, et al. Plasma protein profiling in a stage defined pancreatic cancer cohort—Implications for early diagnosis. Mol Oncol. (2016) 10:1305–16. doi: 10.1016/j.molonc.2016.07.001

PubMed Abstract | Crossref Full Text | Google Scholar

11. Jenkinson C, Elliott V, Menon U, Apostolidou S, Fourkala OE, Gentry-Maharaj A, et al. Evaluation in pre-diagnosis samples discounts ICAM-1 and TIMP-1 as biomarkers for earlier diagnosis of pancreatic cancer. J Proteom. (2015) 113:400–2. doi: 10.1016/j.jprot.2014.10.001

PubMed Abstract | Crossref Full Text | Google Scholar

12. Makawita S, Dimitromanolakis A, Soosaipillai A, Soleas I, Chan A, Gallinger S, et al. Validation of four candidate pancreatic cancer serological biomarkers that improve the performance of CA19-9. BMC Cancer. (2013) 13:404. doi: 10.1186/1471-2407-13-404

PubMed Abstract | Crossref Full Text | Google Scholar

13. Pan S, Brentnall TA, and Chen R. Proteomics analysis of bodily fluids in pancreatic cancer. Proteomics. (2015) 15:2705–15. doi: 10.1002/pmic.201400476

PubMed Abstract | Crossref Full Text | Google Scholar

14. Park J, Lee E, Park KJ, Park HD, Kim JW, Woo HI, et al. Large-scale clinical validation of biomarkers for pancreatic cancer using a mass spectrometry-based proteomics approach. Oncotarget. (2017) 8:42761–71. doi: 10.18632/oncotarget.17463

PubMed Abstract | Crossref Full Text | Google Scholar

15. Park J, Choi Y, Namkung J, Yi SG, Kim H, Yu J, et al. Diagnostic performance enhancement of pancreatic cancer using proteomic multimarker panel. Oncotarget. (2017) 8:93117–30. doi: 10.18632/oncotarget.21861

PubMed Abstract | Crossref Full Text | Google Scholar

16. Yoneyama T, Ohtsuki S, Honda K, Kobayashi M, Iwasaki M, Uchida Y, et al. Identification of IGFBP2 and IGFBP3 as compensatory biomarkers for CA19–9 in early-stage pancreatic cancer using a combination of antibody-based and LC-MS/MS-based proteomics. PloS One. (2016) 11:e0161009. doi: 10.1371/journal.pone.0161009

PubMed Abstract | Crossref Full Text | Google Scholar

17. Fahrmann JF, Bantis LE, Capello M, Scelo G, Dennison JB, Patel N, et al. A plasma-derived protein-metabolite multiplexed panel for early-stage pancreatic cancer. J Natl Cancer Inst. (2019) 111:372–9. doi: 10.1093/jnci/djy126

PubMed Abstract | Crossref Full Text | Google Scholar

18. Ren S, Qian LC, Cao YY, Daniels MJ, Song LN, Tian Y, et al. Computed tomography-based radiomics diagnostic approach for differential diagnosis between early- and late-stage pancreatic ductal adenocarcinoma. World J Gastrointest Oncol. (2024) 16:1256–1267.38660647. doi: 10.4251/wjgo.v16.i4.1256

PubMed Abstract | Crossref Full Text | Google Scholar

19. Ren S, Qin B, Daniels MJ, et al. Developing and validating a computed tomography radiomics strategy to predict lymph node metastasis in pancreatic cancer. World J Radiol. (2025) 17:109373.40901350. doi: 10.4329/wjr.v17.i8.109373

PubMed Abstract | Crossref Full Text | Google Scholar

20. Luo XL, Liu JJ, Wang HZ, and Lu HT. Metabolomics identified new biomarkers for the precise diagnosis of pancreatic cancer and associated tissue metastasis. Pharmacol Res. (2020) 156:104805. doi: 10.1016/j.phrs.2020.104805

PubMed Abstract | Crossref Full Text | Google Scholar

21. Pannala R, Basu A, Petersen GM, and Chari ST. New-onset diabetes: A potential clue to the early diagnosis of pancreatic cancer. Lancet Oncol. (2009) 10:88–95. doi: 10.1016/S1470-2045(08)70337-1

PubMed Abstract | Crossref Full Text | Google Scholar

22. Li D. Diabetes and pancreatic cancer. Mol Carcinog. (2012) 51:64–74. doi: 10.1002/mc.20771

PubMed Abstract | Crossref Full Text | Google Scholar

23. Mizrahi JD, Surana R, Valle JW, and Shroff RT. Pancreatic cancer. Lancet. (2020) 395:2008–20. doi: 10.1016/S0140-6736(20)30974-0

PubMed Abstract | Crossref Full Text | Google Scholar

24. Pereira SP, Oldfield L, Ney A, Hart PA, Keane MG, Pandol SJ, et al. Early detection of pancreatic cancer. Lancet Gastroenterol Hepatol. (2020) 5:698–710. doi: 10.1016/S2468-1253(19)30416-9

PubMed Abstract | Crossref Full Text | Google Scholar

25. He X, Zhong J, Wang S, Zhou Y, Wang L, Zhang Y, et al. Serum metabolomics differentiating pancreatic cancer from new-onset diabetes. Oncotarget. (2017) 8:29116–24. doi: 10.18632/oncotarget.16249

PubMed Abstract | Crossref Full Text | Google Scholar

26. Zhao R, Ren S, Li C, Guo K, Lu Z, Tian L, et al. Biomarkers for pancreatic cancer based on tissue and serum metabolomics analysis in a multicenter study. Cancer Med. (2023) 12:5158–71. doi: 10.1002/cam4.5296

PubMed Abstract | Crossref Full Text | Google Scholar

27. Cao YY, Guo K, Zhao R, Li Y, Lv XJ, Lu ZP, et al. Untargeted metabolomics characterization of the resectab le pancreatic ductal adenocarcinoma. Digit Health. (2023) 9:20552076231179007. doi: 10.1177/20552076231179007

PubMed Abstract | Crossref Full Text | Google Scholar

28. AACR Cancer Progress Report 2022 Steering Committee. Cancer in 2022. Cancer Discov. (2022) 12:2733–8. doi: 10.1158/2159-8290.CD-22-1134

PubMed Abstract | Crossref Full Text | Google Scholar

29. Strobel O, Hank T, Hinz U, Bergmann F, Schneider L, Springfeld C, et al. Pancreatic cancer surgery: the new R-status counts. Ann Surg. (2017) 265:565–73. doi: 10.1097/SLA.0000000000001731

PubMed Abstract | Crossref Full Text | Google Scholar

30. Armitage EG and Barbas C. Metabolomics in cancer biomarker discovery: current trends and future perspectives. J Pharm BioMed Anal. (2014) 87:1–11. doi: 10.1016/j.jpba.2013.08.041

PubMed Abstract | Crossref Full Text | Google Scholar

31. Dey P, Kimmelman AC, and DePinho RA. Metabolic codependencies in the tumor microenvironment. Cancer Discov. (2021) 11:1067–81. doi: 10.1158/2159-8290.CD-20-1211

PubMed Abstract | Crossref Full Text | Google Scholar

32. Hayes JD, Dinkova-Kostova AT, and Tew KD. Oxidative stress in cancer. Cancer Cell. (2020) 38:167–97. doi: 10.1016/j.ccell.2020.06.001

PubMed Abstract | Crossref Full Text | Google Scholar

33. Harris IS and DeNicola GM. The complex interplay between antioxidants and ROS in cancer. Trends Cell Biol. (2020) 30:440–51. doi: 10.1016/j.tcb.2020.03.002

PubMed Abstract | Crossref Full Text | Google Scholar

34. Yan J, Chen D, Ye Z, Zhu X, Li X, Jiao H, et al. Molecular mechanisms and therapeutic significance of Tryptophan Metabolism and signaling in cancer. Mol Cancer. (2024) 23:241. doi: 10.1186/s12943-024-02164-y

PubMed Abstract | Crossref Full Text | Google Scholar

35. Labuschagne CF, van den Broek NJ, Mackay GM, Vousden KH, and Maddocks OD. Serine, but not glycine, supports one-carbon metabolism and proliferation of cancer cells. Cell Rep. (2014) 7:1248–58. doi: 10.1016/j.celrep.2014.04.045

PubMed Abstract | Crossref Full Text | Google Scholar

36. Mukha D, Fokra M, Feldman A, Sarvin B, Sarvin N, Nevo-Dinur K, et al. Glycine decarboxylase maintains mitochondrial protein lipoylation to support tumor growth. Cell Metab. (2022) 34:775–82. doi: 10.1016/j.cmet.2022.04.006

PubMed Abstract | Crossref Full Text | Google Scholar

37. Rinaldi G, Pranzini E, Van Elsen J, Broekaert D, Funk CM, Planque M, et al. In Vivo Evidence for Serine Biosynthesis-Defined Sensitivity of Lung Metastasis, but Not of Primary Breast Tumors, to mTORC1 Inhibition. Mol Cell. (2021) 81:386–97. doi: 10.1016/j.molcel.2020.11.027

PubMed Abstract | Crossref Full Text | Google Scholar

38. Xu R, Jones W, Wilcz-Villega E, Costa AS, Rajeeve V, Bentham RB, et al. The breast cancer oncogene IKKepsilon coordinates mitochondrial function and serine metabolism. EMBO Rep. (2020) 21:e48260. doi: 10.15252/embr.201948260

PubMed Abstract | Crossref Full Text | Google Scholar

39. Smith ALM, Whitehall JC, and Greaves LC. Mitochondrial DNA mutations in ageing and cancer. Mol Oncol. (2022) 16:3276–94. doi: 10.1002/1878-0261.13291

PubMed Abstract | Crossref Full Text | Google Scholar

40. Biancur DE, Paulo JA, Małachowska B, Quiles Del Rey M, and Sousa CM. Compensatory metabolic networks in pancreatic cancers upon perturbation of glutamine metabolism. Nat Commun. (2017) 3:8:15965. doi: 10.1038/ncomms15965

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: metabolic biomarkers, pancreatic ductal adenocarcinoma, new-onset diabetes, early detection, metabolomics, machine learning

Citation: Jiang W, Cheng Z, Mu R, Sun H, Guo Z, Yu G, Wang D and Yang L (2025) Machine learning-optimized metabolic biomarker panel for precision screening of early-stage pancreatic cancer in new-onset diabetes. Front. Endocrinol. 16:1684608. doi: 10.3389/fendo.2025.1684608

Received: 12 August 2025; Accepted: 21 November 2025; Revised: 24 October 2025;
Published: 08 December 2025.

Edited by:

Xin Li, Lanzhou University, China

Reviewed by:

Shuai Ren, Affiliated Hospital of Nanjing University of Chinese Medicine, China
Lei Li, University of Otago, New Zealand

Copyright © 2025 Jiang, Cheng, Mu, Sun, Guo, Yu, Wang and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Dongyan Wang, d2R5MDE2MzVAZ2xob3NwaXRhbC5jb20=; Lijuan Yang, aHVtb3VybGlmZTAwMUAxNjMuY29t

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.