ORIGINAL RESEARCH article
Front. Bioinform.
Sec. Integrative Bioinformatics
Volume 5 - 2025 | doi: 10.3389/fbinf.2025.1644695
BC-Predict: Mining of signal biomarkers and production of models for early-stage breast cancer subtyping and prognosis
Provisionally accepted- 1Systems Computational Biology Lab, School of Chemical and Biotechnology, SASTRA University, Thanjavur, India
- 2Lincoln City Hospital, United Lincolnshire Hospitals NHS Trust, Lincoln, United Kingdom
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Disease heterogeneity is the hallmark of breast cancer, which remains the most common female malignancy. With a consistent increase in mortality and disease burden, there exists a need for effective early-stage theragnostic and prognostic biomarkers. In this work, we improved on BrcaDx (https://apalania.shinyapps.io/brcadx/) for cancer vs control screening and examined a cluster of adjoining learning problems in breast cancer heterogeneity: (i) identification of metastatic cancers; (ii) molecular subtyping (TNBC, HER2, or luminal); and (iii) histological subtyping (invasive ductal or invasive lobular). We analysed the transcriptomic profiles of breast cancer patients from public-domain databases such as the TCGA using stage-encoded problem-specific statistical models of gene expression and unveiled a handful of stage-salient and progression-significant genes. Using a consensus approach, we identified potential machine learning features, and considered six model classes for each learning problem, with hyperparameter optimization on a training dataset and evaluation on a holdout test dataset. A nested approach enabled us to identify the best model class for each learning problem. External validation of the best models yielded balanced accuracies of 97.42% for cancer vs normal; 88.22% for metastatic v/s non metastatic; 88.79% for ternary molecular subtyping; and ensemble accuracy of 94.23% for histological subtyping. The model for molecular subtyping was also validated on a 26-sample TNBC-only out-of-distribution cohort, yielding 25 correct predictions. In addition, we performed a late integration of multi-omics datasets by validating the feature space used in each problem with miRNA profiles, methylation profiles, and commercial breast cancer panels. Pending prospective studies, we have translated the models into BC-Predict that forks the best models developed for each problem and provides an integrated readout of input instances of expression data, together with uncertainty estimates. BC-Predict is freely available for non-commercial purposes at:https://apalania.shinyapps.io/BC-Predict.
Keywords: breast cancer, Disease heterogeneity, machine learning, Molecular subtype, Histological subtype, metastatic disease, stage-specific differential expression, biomarker discovery and validation
Received: 10 Jun 2025; Accepted: 12 Aug 2025.
Copyright: © 2025 Muthamilselvan, Vaithilingam and Palaniappan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Ashok Palaniappan, Systems Computational Biology Lab, School of Chemical and Biotechnology, SASTRA University, Thanjavur, India
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.