Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Bioinform.

Sec. RNA Bioinformatics

This article is part of the Research TopicMachine Learning and AI Tools for RNA BioinformaticsView all articles

An explainable-AI framework reveals novel lncRNAs specific for breast cancer subtypes

Provisionally accepted
  • University of Nebraska Medical Center, Omaha, NE, United States

The final, formatted version of the article will be published soon.

Background: Long non-coding RNAs (lncRNAs) have emerged as important regulators in cancer biology; yet their potential for cancer subtyping remains underexplored particularly in the context of large-scale, multi-class supervised classification frameworks, due to limited publicly available data or their use only as auxiliary features in classification tasks. Methods: In this study, we utilized an expansive set of 7177 lncRNAs obtained from 1021 breast cancer (BRCA) transcriptomics datasets for subtyping using an explainable artificial intelligence (AI) framework. lncRNA, mRNA, and miRNA features were used to build machine learning (ML) models individually and in combination. Four ML classifiers: Naïve Bayes, Random Forest, Artificial Neural Network, and XGBoost were employed to evaluate subtype classification performance. Results: Using lncRNAs alone, XGBoost demonstrated strong performance with an accuracy of 89.2% and AUROC of 0.99. Addition of miRNA or mRNA features to lncRNA marginally improved the accuracy to 90.8% and 92.2%, respectively, while using all the three features together provided no further gain. A sequential key feature identification pipeline (ANOVA, Boruta, SHAP) has identified interpretable subtype-specific biomarker panels, yielding 119, 66, 54, and 24 unique features for Luminal A, Luminal B, HER2+, and Basal subtypes, respectively. Further lncRNA characterization followed by survival analysis revealed significant subtype-specific novel lncRNAs, including CUFF.25255 (LumA), CUFF.20237 and CUFF.3888 (LumB), CUFF.22414 (HER2+), and CUFF.26607 and CUFF.1961 (Basal). Conclusions: Our findings highlight the diagnostic and biomarker discovery potential of lncRNAs, and the explainable-AI framework implemented here provides a systematic large-scale evaluation of lncRNA-only and integrative models for multi-class BRCA subtyping for BRCA subtyping and can be adopted to other cancers using the existing cancer transcriptomics data in the public databases.

Keywords: Breast cancer subtyping, Explainable-ai, long non-coding RNA, Multi-omics integration, NovellncRNA

Received: 04 Dec 2025; Accepted: 16 Feb 2026.

Copyright: © 2026 Patel, Veerappa and Guda. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Chittibabu Guda

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.