Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Medicine and Public Health

This article is part of the Research TopicArtificial Intelligence-based Multimodal Imaging and Multi-omics in Medical ResearchView all 9 articles

Scaling Transformers to High-Dimensional Sparse Data: A Reformer-BERT Approach for Large-Scale Classification

Provisionally accepted
Wanxuan  LiWanxuan Li1,2Xinhua  LiXinhua Li3Weihang  GuoWeihang Guo2Boyuan  GuBoyuan Gu4Jianjun  DuJianjun Du3Ning  ChiNing Chi3Dan  ShaoDan Shao1Kai  XiaoKai Xiao1*Ren  MoRen Mo2,3,4*
  • 1South China University of Technology School of Medicine, Guangzhou, China
  • 2Department of Urology, Inner Mongolia people’s Hospital, Inner Mongolia Urological Institute, Hohhot, China
  • 3Affiliated Inner Mongolia Clinical College of Inner Mongolia Medical University, Hohhot, China
  • 4Institutes of Biomedical Sciences, Inner Mongolia University, Hohhot, China

The final, formatted version of the article will be published soon.

Objective: The precise identification of human cell types and their intricate interactions is of fundamental importance in biological research. Confronted with the challenges inherent in manual cell type annotation from the high-dimensional molecular feature data generated by single-cell RNA sequencing (scRNA-seq)—a technology that has otherwise opened new avenues for such explorations—this study aimed to develop and evaluate a robust, large-scale pre-trained model designed for automated cell type classification, with a focus on major cell categories in this initial study. Methods: A novel methodology for cell type classification, named scReformer-BERT, was developed, leveraging a BERT (Bidirectional Encoder Representations from Transformers) architecture that integrates Reformer encoders. This framework was subjected to extensive self-supervised pre-training on substantial scRNA-seq datasets, after which supervised fine-tuning and rigorous 5-fold cross-validation was performed to optimize the model for predictive accuracy on targeted first-tier cell type classification tasks. A comprehensive ablation study was also conducted to dissect the contributions of each architectural component, and SHAP (SHapley Additive exPlanations) analysis was used to interpret the model's decisions. Results: The performance of the proposed model was rigorously evaluated through a series of experiments. These evaluations, conducted on scRNA-seq data, consistently revealed the superior efficacy of our approach in accurately classifying major cell categories when compared against several established baseline methods and the inherent difficulties in the field. Conclusion: Considering these outcomes, the developed large-scale pre-trained model, which synergizes Reformer encoders with a BERT architecture, presents a potent, effective and interpretable solution for automated cell type classification derived from scRNA-seq data. Its notable performance suggests considerable utility in improving both the efficiency and precision of cellular identification in high-throughput genomic investigations.

Keywords: cell type, major cell categories, Classification, Gene Expression, ScRNA-seq

Received: 07 Jul 2025; Accepted: 31 Oct 2025.

Copyright: © 2025 Li, Li, Guo, Gu, Du, Chi, Shao, Xiao and Mo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Kai Xiao, kxiaoilsazive@outlook.com
Ren Mo, moren325@163.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.