ORIGINAL RESEARCH article
Front. Immunol.
Sec. Cancer Immunity and Immunotherapy
This article is part of the Research TopicAdvances in the Treatment of Nasopharyngeal CancerView all 10 articles
Machine Learning-Based Prediction of Nasopharyngeal Carcinoma (NPC) Risk: A Clinical Approach
Provisionally accepted- 1Department of Laboratory Medicine, People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, China
- 2Department of Dermatology, People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, China
- 3Key Laboratory of Nasopharyngeal Carcinoma Molecular Epidemiology, Wuzhou Red Cross Hospital, Wuzhou, China
- 4Department of Laboratory Medicine, The First People's Hospital of Fangchenggang City, Fangchenggang, Guangxi Zhuang Autonomous Region, Fangchenggang, China
- 5Department of Laboratory Medicine, The People's Hospital of Yongning District, Nanning, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Early screening and risk assessment of nasopharyngeal carcinoma (NPC) are essential for timely diagnosis and improved treatment outcomes. This study aimed to develop and evaluate predictive models using logistic regression and machine learning (ML) techniques to identify significant risk factors for NPC across various healthcare settings. Methods: A total of 569 participants were enrolled in the internal training and validation cohorts, and 160 were enrolled in the independent external validation cohort. Several Epstein-Barr virus (EBV)-related antibodies and serological and haematological markers were assessed to identify discriminatory features between NPC and non-NPC individuals. Feature selection was performed using least absolute shrinkage and selection operator (LASSO) regression, recursive feature elimination cross-validation (REFCV), and support vector machine recursive feature elimination cross-validation (SVMREFCV). The performance of nine machine learning (ML) models (logistic regression (LR), eXtreme Gradient Boosting (XGBoost), light gradient boosting machine (LightGBM), random forest (RF), AdaBoost, multilayer perceptron (MLP), decision tree (DT), gradient boosting decision tree (GBDT), and Gaussian Naïve Bayes (GNB)) was evaluated using the area under the curve (AUC), accuracy (ACC), sensitivity (SE), and specificity (SP) in both the training and validation cohorts. Model calibration was assessed using calibration plots and clinical utility was evaluated through decision curve analysis (DCA). Results: Five key predictors (nuclear antigen 1 immunoglobulin A (NTA1-IgA), viral capsid antigen immunoglobulin A (VCA-IgA), Rta protein immunoglobulin A (Rta-IgA), platelet (PLT) count, and lymphocyte (LM) count) were consistently identified across the three feature selection algorithms. The XGBoost model achieved the highest performance in the internal training (AUC = 0.999) and validation cohorts (AUC = 0.995); it also outperformed in the independent external validation cohort with an AUC of 0.956. Calibration and DCA for both internal and intendent external cohorts were then confirmed the strong clinical utility for the XGBoost model. An outline tool also enabled real-time NPC risk prediction based on the five selected biomarkers. Conclusion: This study presents a robust and interpretable ML-based approach for NPC risk prediction, integrating EBV serology and hematological markers. The model demonstrated high predictive accuracy and potential for population-based screening, providing an efficient tool for early NPC detection and intervention planning.
Keywords: Nasopharyngeal carcinoma (NPC), NPC screening, Predictive Modeling, machinelearning (ML), Epstein-Barr virus (EBV), Logistic regression
Received: 17 Jun 2025; Accepted: 05 Nov 2025.
Copyright: © 2025 Yang, Zhou, Tang, Huang, Zhu, Li, Huang, Liang, Pan and Yuan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Yulin Yuan, yuanyulin@126.com
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
