- 1Department of Otolaryngology, General Hospital of Ningxia Medical University, Yinchuan, China
- 2Peking University First Hospital Ningxia Women and Children’ Hospital, Yinchuan, China
- 3The Second Clinical Medical College, Ningxia Medical University, Yinchuan, China
- 4School of Clinical Medicine, Ningxia Medical University, Yinchuan, China
Background: This study aims to develop and validate a survival prediction model for T4 or N3 locally advanced nasopharyngeal carcinoma (NPC) patients undergoing chemoradiotherapy (CRT) using machine learning methods.
Methods: A total of 293 patients with locally advanced NPC (T4 or N3 stage) treated with CRT were included in the study. The cohort was divided into a training set (173 patients) and a validation set (120 patients). LASSO regression was used to identify significant prognostic factors, and Cox regression analysis was performed to assess the independent impact of these factors on progression-free survival (PFS). A nomogram was constructed based on the identified prognostic factors to predict 1-, 2-, and 3-year PFS. Model performance was validated using ROC curves, calibration curves, and decision curve analysis (DCA).
Results: The training cohort showed 1-, 2-, and 3-year PFS rates of 92.4%, 81.3%, and 75.2%, respectively. In the validation cohort, the 1-, 2-, and 3-year PFS rates were 90.1%, 83.5%, and 76.0%, respectively, with no significant differences between the groups (P = 0.94). The LASSO-Cox model identified N stage and Epstein-Barr virus (EBV) levels as key prognostic factors. The nomogram demonstrated good discrimination with AUC values of 0.802, 0.709, and 0.686 at 1, 2, and 3 years, respectively. The ROC curve shows the model’s performance with AUC values at 1 year (0.802), 2 years (0.709), and 3 years (0.686), demonstrating the model’s ability to distinguish between different survival outcomes. The calibration curves and DCA confirmed the model’s good agreement with observed outcomes and its clinical net benefit across different risk thresholds.
Conclusion: The survival prediction model based on LASSO and Cox regression provides a robust and interpretable tool for predicting PFS in patients with T4 or N3 locally advanced NPC undergoing CRT.
Introduction
Nasopharyngeal carcinoma (NPC) is a malignancy that originates in the epithelial cells of the nasopharynx (1). It is notably prevalent in Southeast Asia, particularly in China, with a strong association with Epstein-Barr virus (EBV) infection (2). NPC is often diagnosed at advanced stages, with local invasion and extensive lymph node metastasis being significant features (3). Among the various stages, locally advanced NPC, particularly in T4 and N3 stages, presents a challenge for treatment due to its poor prognosis, despite aggressive therapies such as concurrent chemoradiotherapy (CRT) (4).
Standard treatment for advanced-stage NPC, including T4 and N3, involves CRT, which has improved survival outcomes (5). However, even with this treatment approach, many patients still experience high rates of recurrence and distant metastasis (6). Therefore, accurately predicting survival outcomes for these patients is critical in tailoring treatment strategies to maximize therapeutic benefit and minimize unnecessary toxicity. Traditional prognostic models, which often rely on clinical factors such as tumor size, lymph node involvement, and EBV status, have limitations in predicting individual patient outcomes due to the complexity of disease progression and treatment responses (7–9).
Recent advances in statistical and machine learning methods have provided new avenues for improving survival prediction. Among these, LASSO and Cox regression models have become increasingly popular. LASSO is an effective technique for selecting the most important variables from a large dataset, ensuring the final model is both efficient and interpretable (10, 11). The Cox proportional hazards model, widely used in survival analysis, allows for examining the relationship between various prognostic factors and patient survival outcomes (12).
For locally advanced NPC patients, particularly those with T4 or N3 disease, a survival prediction model based on LASSO and Cox regression can be highly effective (13). By integrating multiple clinical variables, such as age, sex, tumor stage, treatment modalities, and response to therapy, this model can offer a more personalized prediction of patient survival. The LASSO method selects the most significant factors, while Cox regression provides insights into how these factors influence survival outcomes over time (14).
The ability to generate accurate and interpretable survival predictions is essential for clinicians, as it helps them identify high-risk patients early, allowing for the optimization of treatment regimens. By providing more tailored care, this approach has the potential to significantly improve survival rates and quality of life for patients with locally advanced NPC, thereby advancing personalized medicine in this challenging clinical context.
Method
Patients
This study retrospectively collected data from 293 patients with locally advanced NPC from three tertiary hospitals in China, covering the period from 2012 to 2020. The inclusion criteria were: 1) a pathological diagnosis of NPC, 2) disease classified as T4 or N3 stage according to the 8th edition of the AJCC staging system, 3) receipt of concurrent chemoradiotherapy (CRT), and 4) availability of follow-up data. Exclusion criteria included: 1) previous treatment with other therapies, such as surgery or non-standard treatments, and 2) incomplete or missing data, which hindered the ability to conduct a comprehensive analysis.
This study was approved by the ethics committee of General Hospital of Ningxia Medical University, and all patients provided informed consent for participation in the study.
Model construction
Firstly, a LASSO regression analysis was performed to select prognostic factors associated with PFS. Patients were randomly divided into training and validation sets in a 6:4 ratio. In the training set, univariate and multivariate Cox regression analyses were conducted to identify independent prognostic factors associated with progression-free survival (PFS). These independent prognostic factors were then used to construct a nomogram for predicting PFS. The PFS was defined as the time from the initiation of CRT to the first occurrence of disease progression, recurrence, distant metastasis, or death from any cause. Patients without such events at the last follow-up were censored at that time point.
Model validation
In the validation set, the performance of the model was assessed using receiver operating characteristic (ROC) curves, calibration curves, and decision curve analysis (DCA). In the training set, model performance was further validated using partial dependence plots (PDP), time-dependent variable importance plots, and the Brier score.
Statistical analysis
Categorical variables and continuous variables were compared using the Chi-square test and appropriate parametric or non-parametric tests, respectively. The risk dependence plot was used to explain the PFS outcomes. Kaplan-Meier (KM) curves were used to analyze the survival rates of the training and validation sets, and Log-rank tests were used to compare differences. All statistical analyses were conducted using R software, and a p-value of <0.05 was considered statistically significant.
Result
Baseline
In the total cohort of 293 patients with locally advanced NPC, the distribution of baseline variables is as follows: 77.1% are male, with an average age of 45.6 years, and 53.9% are aged 45 or older. Tumor staging shows that 57.3% of patients are classified as T4, and 49.1% have N3 stage. Regarding EBV DNA levels, 36.9% of patients have levels ≥10000. Upon comparing the training set (173 patients) and the test set (120 patients), no significant differences were observed in the distribution of these variables (Table 1).
Survival
In the training cohort, the 1-, 2-, and 3-year PFS rates were 92.4%, 81.3%, and 75.2%, respectively. In the validation cohort, the 1-, 2-, and 3-year PFS rates were 90.1%, 83.5%, and 76.0%, respectively. There were no significant differences in PFS between the two groups (P = 0.94, Figure 1).
Model construction
The LASSO model identified age, T stage, N stage, and EBV as risk factors influencing PFS (Supplementary Figures 1A, B). In the training cohort, multivariate Cox analysis confirmed that N stage and EBV levels were independent prognostic factors for PFS (Table 2). Based on N and EBV, a nomogram was constructed to predict 1-, 2-, and 3-year PFS (Figure 2).
Model validation
The ROC curve shows the model’s performance with AUC values at 1 year (0.802), 2 years (0.709), and 3 years (0.686), indicating varying discrimination ability (Supplementary Figure 2A). The calibration curve confirms how well the predicted PFS aligns with observed outcomes (Supplementary Figure 2B). The decision curve analysis demonstrates the clinical net benefit of the model at different risk thresholds (Supplementary Figure 2C).
Model interpretation
In the training cohort, partial dependence plots (PDP) confirmed that higher EBV levels, older age, and more advanced T and N stages were associated with worse survival (Figure 3). Supplementary Figure 3A shows the model’s performance over time, with the Brier score decreasing, indicating improved prediction accuracy. The graph demonstrates that the model’s ability to discriminate between different survival outcomes improves over time, as indicated by the gradual increase in AUC. Supplementary Figure 3B shows that EBV and N stage are the most important factors affecting PFS. Figure 4A shows the distribution of risk scores, with a cutoff of 1.65 separating low-risk (blue) and high-risk (red) groups. Figure 4B indicates that high-risk patients have shorter PFS, while low-risk patients have longer survival. Figure 4C highlights clinical variables (EBV, N/T stage, age), showing higher EBV levels and more advanced stages in the high-risk group.
 
  Figure 3. The partial dependence plots (PDPs) in the figure illustrate the relationship between different prognostic factors (Age, Epstein-Barr virus, N stage, and T stage) and progression-free survival (PFS).
 
  Figure 4. (A) Risk scores are divided into low-risk (blue) and high-risk (red) groups based on the cutoff of 1.65. (B) Progression-free survival is shown for each patient, with the low-risk group (blue) having longer survival compared to the high-risk group (red). (C) The heatmap shows the expression of clinical variables across the risk groups.
Discussion
Locally advanced NPC, particularly those at T4 and N3 stages, presents significant treatment challenges and is associated with poor prognosis (15). Despite the aggressive nature of concurrent CRT, many patients with advanced NPC experience high rates of recurrence and distant metastasis, which worsens their survival outcomes (16). This research is focused on improving prognostication for these high-risk patients by developing a survival prediction model using advanced statistical and machine learning techniques, such as LASSO and Cox regression. By integrating clinical variables, the aim is to enhance personalized treatment planning and offer more accurate survival predictions for patients diagnosed with locally advanced NPC.
The current standard treatment for locally advanced NPC, including T4 and N3 stages, remains concurrent CRT (17). While CRT has shown to improve survival rates, the long-term prognosis for these patients remains suboptimal. High rates of recurrence and distant metastasis suggest that conventional treatment strategies may not be sufficient for all patients, underscoring the importance of developing better prognostic tools to guide treatment decisions (18). Effective prediction models can potentially identify high-risk individuals early, enabling more tailored and aggressive interventions while avoiding unnecessary toxicity in low-risk patients.
In this study, we leveraged LASSO regression to select the most influential prognostic factors, followed by Cox regression for survival analysis. This combination allows for the creation of a robust and interpretable model, which provides both predictive power and clinical applicability. LASSO helps mitigate overfitting by performing variable selection from a broad set of potential predictors, ensuring that only the most relevant factors are included in the final model (19, 20). The use of Cox regression further enhances model interpretability by quantifying the impact of each variable on survival outcomes (21). Additionally, we utilized PDP to visualize the relationship between continuous predictors (e.g., EBV levels and age) and survival, providing valuable insights into how these factors influence prognosis. This feature makes the model more interpretable and clinically relevant, offering a deeper understanding of patient outcomes (22).
The performance of our model is reflected in its ROC curve, with AUC values of 0.802, 0.709, and 0.686 for 1, 2, and 3 years, respectively. This demonstrates good predictive ability and discrimination power, particularly in the short term, which is crucial for clinical decision-making. However, the gradual decline in AUC also highlights the limitations of long-term prediction. Possible explanations include the increasing influence of unmeasured factors (such as genetic or immune characteristics), treatment heterogeneity, and biological variability of the disease over time, all of which may reduce the accuracy of long-term prognostic estimation. Despite these limitations, the model remains clinically valuable: it can help identify high-risk patients with locally advanced NPC who are prone to recurrence or metastasis, thereby guiding clinicians in selecting appropriate treatment regimens and enabling more timely, personalized interventions. Future models incorporating multi-omics or immune-related data may further enhance long-term prediction and improve patient outcomes.
Furthermore, our model may have practical implications in guiding future treatment strategies. Patients identified by the model as having poor predicted outcomes could be considered as candidates for novel therapeutic approaches, such as immunotherapy (23, 24). Recent studies have shown that PD-1/PD-L1 inhibitors provide meaningful clinical benefits in recurrent or metastatic NPC, and ongoing trials are exploring their role in combination with chemoradiotherapy in locally advanced disease (25, 26). By integrating prognostic prediction with treatment selection, our model could help clinicians identify high-risk patients who may benefit from immunotherapy, thereby improving individualized treatment planning.
Despite the promising results, several limitations must be acknowledged (27, 28). Firstly, this study is retrospective, which introduces potential biases inherent in observational studies. The cohort is derived from three tertiary hospitals, which may limit its generalizability to other regions with different patient populations and healthcare settings. Additionally, while we included several clinical variables in the model, the lack of genetic, radiomics, and immune profiling data could reduce the model’s predictive accuracy. Importantly, the study only performed internal validation, and the absence of external validation in independent cohorts limits the robustness and generalizability of the findings. Moreover, detailed information on recurrence sites (local, regional, distant) was not available, which may restrict deeper understanding of prognostic implications. Finally, treatment heterogeneity across hospitals, such as variations in radiation doses or chemotherapy regimens, could influence outcomes and complicate the interpretation of results.
Conclusion
In conclusion, this study presents a novel survival prediction model for patients with locally advanced NPC, particularly those with T4 and N3 stages.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.
Ethics statement
The studies involving humans were approved by Medical Research Ethics Review Committee of the General Hospital of Ningxia Medical University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
ZM: Writing – original draft, Writing – review & editing. WL: Writing – review & editing, Writing – original draft, Conceptualization. XL: Formal Analysis, Writing – review & editing, Writing – original draft. XN: Writing – review & editing, Writing – original draft. YL: Project administration, Methodology, Writing – review & editing, Writing – original draft. YM: Validation, Writing – original draft, Writing – review & editing. LH: Writing – review & editing, Writing – original draft.
Funding
The author(s) declare financial support was received for the research and/or publication of this article. Thanks to the support of Natural Science Foundation of Ningxia (2023AAC02065, 2024AAC03622), Key R&D Programme of the Ningxia Autonomous Region (2022BEG03105).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2025.1683501/full#supplementary-material
References
1. Chen YP, Chan ATC, Le QT, Blanchard P, Sun Y, and Ma J. Nasopharyngeal carcinoma. Lancet. (2019) 394:64–80. doi: 10.1016/S0140-6736(19)30956-0
2. Zhang Z, Chen X, and Yuan T. Precision radiotherapy for nasopharyngeal carcinoma. Prec Radiat Oncol. (2024) 8:37–41. doi: 10.1002/pro6.1219
3. Fei Z, Xu T, Qiu X, Li M, Chen T, Li L, et al. Significance of boost dose for T4 nasopharyngeal carcinoma with residual primary lesion after intensity-modulated radiotherapy. J Cancer Res Clin Oncol. (2021) 147:2047–55. doi: 10.1007/s00432-020-03479-1
4. Pan JJ, Mai HQ, Ng WT, Hu CS, Li JG, Chen XZ, et al. Ninth version of the AJCC and UICC nasopharyngeal cancer TNM staging classification. JAMA Oncol. (2024) 10:1627–35. doi: 10.1001/jamaoncol.2024.4354
5. Lin TY, Lan MY, Tsou HH, Ho CY, Twu CW, Liu YC, et al. Survival impacts of different nodal characteristics and T-classification in N3 nasopharyngeal carcinoma patients. Oral Oncol. (2020) 108:104820. doi: 10.1016/j.oraloncology.2020.104820
6. Liao W, Zhao Y, and Zhang S. The evolution of prophylactic neck irradiation in nasopharyngeal carcinoma: Changing concepts and irradiation ranges. Prec Radiat Oncol. (2025) 9:61–68. doi: 10.1002/pro6.70007
7. Jiang Y, Qu S, Pan X, Huang S, and Zhu X. Prognostic nomogram for locoregionally advanced nasopharyngeal carcinoma. Sci Rep. (2020) 10:861. doi: 10.1038/s41598-020-57968-x
8. Zhai X, Yuan J, Su X, Zhang H, and Guo R. Optimized nomogram for nasopharyngeal carcinoma prognosis prediction in younger patients (Aged 18-59): development and validation. Ear Nose Throat J. (2024) 29:1455613231223901. doi: 10.1177/01455613231223901
9. Wang Y, Jian W, Yuan Z, Guan F, and Carlson D. Deep learning with attention modules and residual transformations improves hepatocellular carcinoma (HCC) differentiation using multiphase CT. Prec Radiat Oncol. (2025) 9:13–22. doi: 10.1002/pro6.70003
10. Song S, Song S, Zhao H, Huang S, Xiao X, Lv X, et al. Using machine learning methods to investigate the impact of age on the causes of death in patients with early intrahepatic cholangiocarcinoma who underwent surgery. Clin Transl Oncol. (2025) 27:1623–31. doi: 10.1007/s12094-024-03716-w
11. Su K, Liu X, Zeng YC, Xu J, Li H, Wang H, et al. Machine learning radiomics for predicting response to MR-guided radiotherapy in unresectable hepatocellular carcinoma: A multicenter cohort study. J Hepatocell Carcinoma. (2025) 12:933–47. doi: 10.2147/JHC.S521378
12. Toumi N, Ennouri S, Charfeddine I, Daoud J, and Khanfir A. Prognostic factors in metastatic nasopharyngeal carcinoma. Braz J Otorhinolaryngol. (2022) 88:212–9. doi: 10.1016/j.bjorl.2020.05.022
13. Peng H, Chen L, Zhang Y, Guo R, Li WF, Mao YP, et al. Survival analysis of patients with advanced-stage nasopharyngeal carcinoma according to the Epstein-Barr virus status. Oncotarget. (2016) 7:24208–16. doi: 10.18632/oncotarget.8144
14. Yang Y, Luo W, Feng Z, Chen X, Li J, Zuo L, et al. An integrative analysis combining bioinformatics, network pharmacology and experimental methods identified key genes of EGCG targets in Nasopharyngeal Carcinoma. Discov Oncol. (2025) 16:742. doi: 10.1007/s12672-025-02365-x
15. Cai M, Wang Y, Ma H, Yang L, and Xu Z. Advances and challenges in immunotherapy for locally advanced nasopharyngeal carcinoma. Cancer Treat Rev. (2024) 131:102840. doi: 10.1016/j.ctrv.2024.102840
16. Mané M, Benkhaled S, Dragan T, Paesmans M, Beauvois S, Lalami Y, et al. Meta-analysis on induction chemotherapy in locally advanced nasopharyngeal carcinoma. Oncologist. (2021) 26:e130–41. doi: 10.1002/ONCO.13520
17. Kong L, Zhang YW, Hu CS, and Guo Y. Neoadjuvant chemotherapy followed by concurrent chemoradiation for locally advanced nasopharyngeal carcinoma. Chin J Cancer. (2010) 29:551–5. doi: 10.5732/cjc.009.10518
18. Isobe K, Ito H, Shigematsu N, Kawada T, Yasuda S, Hara R, et al. Advanced nasopharyngeal carcinoma treated with chemotherapy and radiotherapy: distant metastasis and local recurrence. Int J Oncol. (1998) 12:1183–7. doi: 10.3892/ijo.12.5.1183
19. Kang J, Choi YJ, Kim IK, Lee HS, Kim H, Baik SH, et al. LASSO-based machine learning algorithm for prediction of lymph node metastasis in T1 colorectal cancer. Cancer Res Treat. (2021) 53:773–83. doi: 10.4143/crt.2020.974
20. Ali H, Shahzad M, Sarfraz S, Sewell KB, Alqalyoobi S, and Mohan BP. Application and impact of Lasso regression in gastroenterology: A systematic review. Indian J Gastroenterol. (2023) 42:780–90. doi: 10.1007/s12664-023-01426-9
21. Zhang Z, Reinikainen J, Adeleke KA, Pieterse ME, and Groothuis-Oudshoorn CGM. Time-varying covariates and coefficients in Cox regression models. Ann Transl Med. (2018) 6:121. doi: 10.21037/atm.2018.02.12
22. Alkhanani MF. Predictive modeling of hemoglobin refractive index using Gaussian process regression with interpretability through partial dependence plots. PLoS One. (2025) 20:e0324827. doi: 10.1371/journal.pone.0324827
23. Huang H, Yao Y, Deng X, Huang Z, Chen Y, Wang Z, et al. Immunotherapy for nasopharyngeal carcinoma: Current status and prospects (Review). Int J Oncol. (2023) 63:97. doi: 10.3892/ijo.2023.5545
24. Hong M, Tang K, Qian J, Deng H, Zeng M, Zheng S, et al. Immunotherapy for EBV-associated nasopharyngeal carcinoma. Crit Rev Oncog. (2018) 23:219–34. doi: 10.1615/CritRevOncog.2018027528
25. Han J, Zeng N, Tian K, Liu Z, She L, Wang Z, et al. First-line immunotherapy combinations for recurrent or metastatic nasopharyngeal carcinoma: An updated network meta-analysis and cost-effectiveness analysis. Head Neck. (2023) 45:2246–58. doi: 10.1002/hed.27452
26. Yeo BSY, Lee RS, Lim NE, Tan E, Jang IJH, Toh HC, et al. Efficacy and safety of cell-based immunotherapy in the treatment of recurrent or metastatic nasopharyngeal carcinoma - A systematic review and meta-analysis. Oral Oncol. (2024) 152:106786. doi: 10.1016/j.oraloncology.2024.106786
27. Wang L, Zhou X, Yan H, Miao Y, Wang B, Gu Y, et al. Deciphering the role of tryptophan metabolism-associated genes ECHS1 and ALDH2 in gastric cancer: implications for tumor immunity and personalized therapy. Front Immunol. (2024) 15:1460308. doi: 10.3389/fimmu.2024.1460308
Keywords: prediction, nasopharyngeal carcinoma, chemotherapy, radiation, machine learning
Citation: Ma Z, Liu W, Luo X, Niu X, Li Y, Ma Y and Hou L (2025) Prediction of prognosis in T4 or N3 locally advanced nasopharyngeal carcinoma receiving chemoradiotherapy using machine learning methods. Front. Oncol. 15:1683501. doi: 10.3389/fonc.2025.1683501
Received: 19 August 2025; Accepted: 29 September 2025;
Published: 09 October 2025.
Edited by:
Yong Yin, Shandong University, ChinaReviewed by:
Pengcheng Zhang, Zhejiang Hospital of Traditional Chinese Medicine, ChinaShanshan Lin, Guangdong Medical University, China
Copyright © 2025 Ma, Liu, Luo, Niu, Li, Ma and Hou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Li Hou, aGxhaGw5OUBzaW5hLmNvbQ==
 Weijie Liu2
Weijie Liu2 
   
   
  