Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Genet.

Sec. Computational Genomics

This article is part of the Research TopicAdvancements in AI for the Analysis and Interpretation of Large-scale Data by Omics TechniquesView all 7 articles

scVAR: integrating genomics and transcriptomics from single-cell RNA-seq — insights from leukemia case studies

Provisionally accepted
  • 1Institute of Biomedical Technologies, Department of Biomedical Sciences, National Research Council (CNR), Segrate, Italy
  • 2Experimental Hematology Unit, Division of Immunology,Transplantation and Infectious Diseases, IRCCS San Raffaele Scientific Institute, Milan, Lombardy, Italy
  • 3San Raffaele Telethon Institute for Gene Therapy (SR-Tiget), Milan, Lombardy, Italy

The final, formatted version of the article will be published soon.

The advent of high-throughput technologies has accelerated biomedical research by facilitating the investigation of biological complexity at unprecedented resolution. Single-cell RNA sequencing (scRNA-seq) has transformed our ability to deconstruct cellular heterogeneity in complex diseases. Acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), for example, are characterized by extensive genetic and phenotypic heterogeneity, making diagnosis and therapy challenging. Although genetic variation is conventionally studied via DNA-based methods, the transcriptome can also be a source of genomic information. Here, we present scVAR, a computational framework that employs variational autoencoders to learn and integrate genetic variation directly from scRNA-seq data. scVAR implements a paired encoder–decoder architecture with a cross-attention–based fusion layer that combines transcriptomic and variant-derived information into a unified latent representation, enhancing the detection of subtle cellular differences under noisy and sparse conditions. We demonstrate its application to leukemia case studies, where scVAR reveals cell identities that are not discernible when transcriptomic or genomic data are analyzed separately. In the datasets analyzed in this study, scVAR identifies approximately 20–30% more subpopulations than transcriptomic analysis alone, highlighting the benefit of integrating variant information even when coverage is limited. As expected for 3′ scRNA-seq, variant detection is restricted to captured regions, but scVAR maximizes the information available within these constraints. Overall, scVAR bridges the gap between transcriptomics and genomics, providing a broadly applicable platform for the integrative characterization of cell states and disease processes.

Keywords: Genetic Heterogeneity, Leukemia, multi-omicsintegration, single-cell RNA sequencing, Variational autoencoder

Received: 01 Apr 2025; Accepted: 08 Dec 2025.

Copyright: © 2025 Celli, Manessi, Barcella and Merelli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Ludovica Celli
Ivan Merelli

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.