ORIGINAL RESEARCH article
Front. Bioinform.
Sec. Integrative Bioinformatics
Volume 5 - 2025 | doi: 10.3389/fbinf.2025.1624329
This article is part of the Research TopicClinical prediction models in cancer through bioinformaticsView all 22 articles
Identification of Multiple Prognostic Biomarker Sets for Risk Stratification in SKCM
Provisionally accepted- Indraprastha Institute of Information Technology Delhi, Delhi, India
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Most of the existing studies have identified a single profile of prognostic biomarkers for predicting high-risk cancer patients using transcriptomics data. In this study, we propose multiple distinct sets of prognostic biomarkers for predicting high-risk Skin Cutaneous Melanoma (SKCM) patients. Our primary analysis reveals that the expression of certain genes, such as CREG1, PCGF5, and VPS13C, strongly correlates with overall survival (OS) in SKCM patients. We developed machine learning-based prognostic models to predict 1-, 3-, and 5-year overall survival using gene expression profiles. State-of-the-art feature selection techniques were employed to identify the primary prognostic biomarkers set consisting of 20 genes. Machine learning models built to predict high-risk patients using this set of biomarkers and achieved an AUC of 0.90 and a Kappa of 0.58 in distinguishing high-risk SKCM patients. Similarly, a second independent set of 20 prognostic genes was identified, with no overlap with the primary set. The best model trained on this second set achieved an AUC of 0.89 with a Kappa of 0.56, while the fifth biomarker set achieved the highest performance with an AUC of 0.91 and a Kappa of 0.64. This process was repeated to obtain a total of seven distinct prognostic biomarker sets, each containing 20 unique genes. The predictive performance of these models varied between 0.84 and 0.91 for AUC and 0.48 to 0.64 for Kappa on the test dataset. These findings demonstrate that it is possible to identify multiple independent sets of prognostic biomarkers, each capable of accurately predicting high-risk SKCM patients. We validated a subset of genes from our primary and third prognostic biomarker sets on the independent GEO dataset GSE65904, achieving AUCs of 0.83 and 0.86, respectively. This confirms the predictive value of our biomarkers. All data and code are available at https://github.com/raghavagps/skcm_prognostic_biomarker.
Keywords: Skin cutaneous melanoma, overall survival, survival analysis, Machinelearning, prognostic biomarker
Received: 07 May 2025; Accepted: 20 Oct 2025.
Copyright: © 2025 Malik, Tomer, Arora and Raghava. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Gajendra PS Raghava, raghava@iiitd.ac.in
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.