Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Bioinform.

Sec. Integrative Bioinformatics

Volume 5 - 2025 | doi: 10.3389/fbinf.2025.1644695

BC-Predict: Mining of signal biomarkers and production of models for early-stage breast cancer subtyping and prognosis

Provisionally accepted
Sangeetha  MuthamilselvanSangeetha Muthamilselvan1Natarajan  VaithilingamNatarajan Vaithilingam2Ashok  PalaniappanAshok Palaniappan1*
  • 1Systems Computational Biology Lab, School of Chemical and Biotechnology, SASTRA University, Thanjavur, India
  • 2Lincoln City Hospital, United Lincolnshire Hospitals NHS Trust, Lincoln, United Kingdom

The final, formatted version of the article will be published soon.

Disease heterogeneity is the hallmark of breast cancer, which remains the most common female malignancy. With a consistent increase in mortality and disease burden, there exists a need for effective early-stage theragnostic and prognostic biomarkers. In this work, we improved on BrcaDx (https://apalania.shinyapps.io/brcadx/) for cancer vs control screening and examined a cluster of adjoining learning problems in breast cancer heterogeneity: (i) identification of metastatic cancers; (ii) molecular subtyping (TNBC, HER2, or luminal); and (iii) histological subtyping (invasive ductal or invasive lobular). We analysed the transcriptomic profiles of breast cancer patients from public-domain databases such as the TCGA using stage-encoded problem-specific statistical models of gene expression and unveiled a handful of stage-salient and progression-significant genes. Using a consensus approach, we identified potential machine learning features, and considered six model classes for each learning problem, with hyperparameter optimization on a training dataset and evaluation on a holdout test dataset. A nested approach enabled us to identify the best model class for each learning problem. External validation of the best models yielded balanced accuracies of 97.42% for cancer vs normal; 88.22% for metastatic v/s non metastatic; 88.79% for ternary molecular subtyping; and ensemble accuracy of 94.23% for histological subtyping. The model for molecular subtyping was also validated on a 26-sample TNBC-only out-of-distribution cohort, yielding 25 correct predictions. In addition, we performed a late integration of multi-omics datasets by validating the feature space used in each problem with miRNA profiles, methylation profiles, and commercial breast cancer panels. Pending prospective studies, we have translated the models into BC-Predict that forks the best models developed for each problem and provides an integrated readout of input instances of expression data, together with uncertainty estimates. BC-Predict is freely available for non-commercial purposes at:https://apalania.shinyapps.io/BC-Predict.

Keywords: breast cancer, Disease heterogeneity, machine learning, Molecular subtype, Histological subtype, metastatic disease, stage-specific differential expression, biomarker discovery and validation

Received: 10 Jun 2025; Accepted: 12 Aug 2025.

Copyright: © 2025 Muthamilselvan, Vaithilingam and Palaniappan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Ashok Palaniappan, Systems Computational Biology Lab, School of Chemical and Biotechnology, SASTRA University, Thanjavur, India

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.