- Department of Computer Science, University of California, Irvine, Irvine, CA, United States
Spatial transcriptomics (ST) technologies enable the profiling of gene expression while preserving spatial context, offering unprecedented insights into tissue organization. However, traditional spatial domain identification methods primarily rely on gene expression matrices and spatial coordinates while overlooking the rich biological knowledge encoded in gene functional descriptions. Here, we propose SpaLLM, a general framework that integrates large language model (LLM) embeddings of gene descriptions with conventional spatial transcriptomics analysis. Our approach leverages pre-computed GenePT embeddings from NCBI gene summaries to create biologically-informed gene representations. SpaLLM combines these LLM-derived gene features with cell-gene expression matrices through matrix multiplication, generating enriched cell representations that capture both expression patterns and functional knowledge. These enriched features are then integrated with existing graph-based spatial analysis methods for improved spatial domain identification. Extensive validation on 12 sequencing-based Visium sections and an independent imaging-based osmFISH dataset demonstrate that SpaLLM consistently enhances spatial domain identification. Our modular framework can be seamlessly integrated with existing spatial analysis pipelines, making it broadly applicable to diverse research scenarios.
1 Introduction
Spatial transcriptomics (ST) technologies have revolutionized our understanding of tissue architecture by enabling simultaneous measurement of gene expression and spatial location information (Rodriques et al., 2019; Emani et al., 2024; Ruzicka et al., 2024). A fundamental task in ST analysis is spatial domain identification, which aims to partition tissue sections into distinct regions based on similar gene expression patterns and spatial proximity. These spatial domains often correspond to anatomical structures, functional units, or pathological states, making their accurate identification crucial for understanding tissue organization and disease mechanisms (Chen et al., 2022).
Current spatial domain identification methods predominantly follow a graph-based approach, where spots or cells are represented as nodes in a spatial graph, and edges encode spatial proximity relationships (Hu et al., 2021; Zhao et al., 2021; Duan et al., 2024a; Duan et al., 2024b; Duan et al., 2025b). These methods typically employ graph neural networks (GNNs) or graph autoencoders to learn latent representations from cell-by-gene expression matrices and spatial coordinates, followed by clustering algorithms to identify spatial domains (Dong and Zhang, 2022; Long et al., 2023). While effective, these approaches have a fundamental limitation: they treat genes merely as numerical features without leveraging the extensive biological knowledge accumulated about gene functions, pathways, and interactions.
Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in understanding biological text. The GenePT framework has shown that LLM embeddings of gene descriptions from NCBI can effectively capture biological relationships and improve downstream tasks in single-cell analysis (Chen and Zou, 2024). Specifically, GenePT uses pre-computed OpenAI text embeddings on NCBI gene summaries, demonstrating that these embeddings often outperform expression-based methods for various biological tasks.
Motivated by these observations, we propose SpaLLM, as shown in Figure 1, a general framework that integrates LLM-derived gene functional features with traditional spatial transcriptomics analysis. Our key insight is that gene functional descriptions contain rich biological knowledge that can inform spatial domain identification beyond what expression patterns alone can reveal. By leveraging pre-trained language models to encode gene descriptions from biological databases, we create biologically-informed gene representations that capture functional relationships, pathway memberships, and molecular mechanisms.
Figure 1. Overview of SpaLLM framework for integrating gene functional knowledge into spatial domain identification. Left: Input spatial transcriptomics data consists of tissue slices with observed gene expression matrix
The SpaLLM framework introduces a novel approach that enhances traditional spatial transcriptomics analysis by integrating gene functional knowledge. Following the standard encoder-decoder paradigm, we first obtain cell embeddings from ST data using existing graph-based methods. Simultaneously, we derive functional cell embeddings by multiplying the cell-by-gene expression matrix with LLM-derived gene embeddings
Our main contributions are as follows:
• We introduce the first systematic framework to integrate LLM embeddings of gene functional descriptions into spatial transcriptomics analysis;
• We demonstrate that incorporating biological knowledge through gene text descriptions significantly improves spatial domain identification accuracy, particularly for low-quality data;
• We provide a modular framework compatible with existing spatial analysis methods, enabling broad adoption across different research pipelines;
• We conduct comprehensive experiments across multiple datasets with varying quality levels, showing consistent 3.8%–400% improvements over state-of-the-art methods.
2 Methods
2.1 Problem formulation and SpaLLM architecture
We formulate the problem of spatial domain identification as a clustering task on a graph. Let
The SpaLLM framework enhances this process by incorporating biological knowledge from gene functional descriptions. We introduce a dual-stream encoding architecture that combines a conventional spatial encoder with a novel LLM-based functional encoder, as illustrated in a figure. The outputs of these two encoders are fused to produce the final enriched embeddings
2.2 Gene functional encoding
To capture the biological meaning of genes, we leverage pre-computed embeddings from a large language model trained on a corpus of gene functional descriptions (e.g., NCBI gene summaries). Let
Here,
2.3 Spatial encoder
The spatial encoder captures both gene expression patterns and spatial context. We construct a spatial graph
Here,
The spatial encoder,
The GNN layers are typically defined by a message passing scheme. For a multi-layer GNN, the update rule for the
where
2.4 Feature fusion and clustering
The embeddings from the two encoders,
The final spot representation
The hyperparameters
Finally, the spatial domains are identified by applying a clustering algorithm, such as K-means, Louvain, or mclust, to the fused feature matrix
3 Experimental setup
3.1 Datasets and data quality simulation
We evaluate SpaLLM on the human dorsolateral prefrontal cortex (DLPFC) spatial transcriptomics datasets from Maynard et al. (2021), which consist of 12 tissue sections with manually annotated spatial domains. To assess the robustness of our framework against varying data quality, we adopt a systematic simulation strategy where data quality is reduced by introducing sparsity Duan et al. (2025a). The original, unaltered data serves as a baseline for comparison. We create four simulated quality levels by randomly masking a percentage of the non-zero gene expression values: Q1 (50% masked), Q2 (75% masked), Q3 (87.5% masked), and Q4 (93.75% masked). This process generates a comprehensive testbed of 48 simulated datasets (12 original sections
To validate cross-platform generalizability, we incorporated the osmFISH dataset (Codeluppi et al., 2018) of the mouse somatosensory cortex. This dataset utilizes imaging-based technology, which provides a higher spatial resolution but fewer genes compared to the sequencing-based Visium platform, offering a complementary modality for testing.
3.2 Implementation details
3.2.1 GenePT feature configuration
We use the pre-computed GenePT embeddings (Chen and Zou, 2024) with the following specifications:
• Embedding model: OpenAI text-embedding-ada-002.
• Feature dimension:
• Gene coverage: 33,000+ genes with NCBI annotations.
3.2.2 Model hyperparameters
We adopt four representative baselines: SpaceFlow (Ren et al., 2022), STAGATE (Dong and Zhang, 2022), GraphST (Long et al., 2023), and stCluster (Wang et al., 2024). The hyperparameters for each method are set following the configurations recommended in their original papers. For SpaLLM integration, we set the weighting parameters
3.3 Evaluation metrics
We evaluate spatial domain identification using the Adjusted Rand Index (ARI), a robust metric for measuring the similarity between a predicted clustering and the ground truth. The ARI corrects for chance agreements and has a value of 1.0 for perfect clustering and 0 for random assignments. The formula for ARI is defined as:
Where
4 Results
4.1 SpaLLM demonstrates consistent improvements across quality levels
We evaluated SpaLLM’s performance on spatial domain identification across 12 real-world and 48 simulated datasets with varying quality levels (Q0-Q4, where Q0 represents the original highest quality datasets and Q4 the lowest). Tables 2–4 present comprehensive results comparing four baseline methods (SpaceFlow, STAGATE, GraphST, and stCluster) with their SpaLLM-enhanced versions across three different donor samples. Results are averaged across ten runs.
Table 2. Spatial domain identification performance (ARI) on Donor 1 datasets. The best performance for each quality level and dataset is bolded.
Table 3. Spatial domain identification performance (ARI) on Donor 2 datasets. The best performance for each quality level and dataset is bolded.
Table 4. Spatial domain identification performance (ARI) on Donor 3 datasets. The best performance for each quality level and dataset is bolded.
The integration of SpaLLM with baseline methods shows consistent improvements across all quality levels and datasets. Notably, the performance gains become more pronounced as data quality decreases, highlighting SpaLLM’s robustness in challenging scenarios where traditional methods struggle.
Across all three donor samples, SpaceFlow integration with SpaLLM achieved modest but consistent improvements ranging from 3.8% to 9.4% in high-quality datasets (Q0, Q1) to more substantial gains of 11.1%–60.0% in degraded datasets (Q3, Q4). For example, in Donor 1 dataset 151509, SpaceFlow improved from 0.25 to 0.28 (12% gain) at Q3 level, while in dataset 151508, Q4 performance increased from 0.10 to 0.13 (30% gain).
STAGATE integration with SpaLLM demonstrated the most substantial improvements among all tested methods. In high-quality scenarios (Q0, Q1), improvements ranged from 5.3% to 9.4%, with notable examples including Donor 1 dataset 151507 improving from 0.55 to 0.58 (5.5% gain) at Q0. However, the most dramatic gains occurred in degraded data scenarios, with Q3 and Q4 improvements reaching 25%–400%. For instance, in the Donor 2 dataset 151670, Q4 performance surged from 0.01 to 0.05 (400% improvement), and in the Donor 1 dataset 151507, Q3 performance increased from 0.12 to 0.16 (33% gain).
GraphST showed significant benefits from SpaLLM integration, with improvements ranging from 6.2% to 13.8% in high-quality datasets to remarkable gains of up to 400% in the most challenging scenarios. In Donor 2, GraphST + SpaLLM achieved the highest overall performance in Q0 and Q1 levels, with values reaching 0.69 and 0.67, respectively, for dataset 151672. The most striking improvements were observed in Q4 scenarios, where performance increased from as low as 0.01 to 0.05 (400% improvement).
stCluster integration yielded improvements across all quality levels. High-quality datasets (Q0, Q1) showed improvements ranging from 4.7% to 8.6%, while degraded scenarios (Q3, Q4) demonstrated gains of 10.0%–50.0%. Notably, stCluster + SpaLLM achieved several best performances in Q2–Q4 categories, such as 0.44 (Q2) and 0.24 (Q3) in the Donor 1 dataset 151507.
4.2 Ablation studies: synergistic effects of LLM priors and integration strategies
To dissect the specific contributions of the individual components within the SpaLLM framework, we performed comprehensive ablation studies focusing on the feature integration strategy and the choice of the LLM embedding model. We utilized the STAGATE baseline on both the DLPFC (151507) and osmFISH datasets as representative cases.
As quantified in Table 5, we compared four distinct integration strategies: (1) Expression only (standard pipeline), (2) Functional only (LLM knowledge only), (3) Simple concatenation, and (4) Weighted fusion (SpaLLM). The results demonstrate a “synergistic effect.” While expression features are dominant in high-quality data (Q0), they suffer from a catastrophic performance collapse as sparsity increases; for example, the ARI on DLPFC drops from 0.55 (Q0) to a mere 0.02 (Q4). In contrast, the Functional only approach maintains remarkable stability (e.g., maintaining an ARI of 0.132 on osmFISH even at Q4), indicating that biological priors act as a vital regularizer when the transcriptomic signal is severely degraded.
Table 5. Comprehensive ablation study results (ARI) across DLPFC and osmFISH datasets using STAGATE.
Furthermore, we evaluated two OpenAI embedding models: text-embedding-ada-002 (ada) and text-embedding-3-large (large). Our analysis shows that while simple concatenation only yields marginal gains, our Weighted fusion strategy achieves the best overall performance. Notably, the large model variant exhibits superior robustness in the most challenging scenarios (Q3–Q4), providing the highest ARI across both datasets. However, ada offers a comparable balance with lower computational overhead, which we selected as the default configuration for general efficiency.
4.3 Cross-platform generalizability: validation on osmFISH
To ensure that SpaLLM is technology-agnostic, we extended our evaluation to the osmFISH dataset (mouse somatosensory cortex). Unlike sequencing-based platforms, osmFISH is an imaging-based technology with a high spatial resolution but a specific gene panel (33 marker genes).
We applied SpaLLM to four representative baselines: SpaceFlow, STAGATE, GraphST, and stCluster. As summarized in Table 6, SpaLLM consistently improved the ARI across all quality levels for every baseline. For instance, GraphST + SpaLLM achieved the highest ARI of 0.52 at Q0 (compared to 0.48 for base GraphST) and maintained a significant lead even at Q4 (0.18 vs. 0.09). These results, accompanied by lower performance variance (standard deviations), prove that integrating LLM-derived knowledge provides a universal enhancement for spatial domain identification that is independent of the underlying experimental modality or algorithmic architecture.
Table 6. Full performance comparison (ARI) on the osmFISH dataset across different data quality levels.
4.4 Practical guidance for low-throughput and small-sample regions
To provide concrete guidance for practitioners working with limited tissue sections, we analyzed the performance of SpaLLM on small spatial subregions. We randomly extracted 10 contiguous subregions, each consisting of only 1,000 cells, from the osmFISH tissue.
As summarized in Table 7, the relative performance improvement introduced by SpaLLM is even more pronounced in these small-sample scenarios compared to full-tissue analysis. For example, GraphST’s performance gain increases from 8.3% on full tissue to 27.3% on subregions. This suggests that when spatial context is limited, LLM-derived gene functional knowledge helps anchor the identity of cell clusters, effectively compensating for the lack of local cell-cell interaction information. Based on these results, we recommend SpaLLM as a critical enhancement for experiments involving small biopsies or sparse cell populations where traditional methods often fail to recover clear domain boundaries.
5 Conclusion and discussion
We presented SpaLLM, a general framework that integrates large language model embeddings of gene functional descriptions into spatial transcriptomics analysis. By leveraging pre-computed GenePT features and combining them with expression data through weighted matrix integration, SpaLLM consistently improves spatial domain identification across varying data quality conditions.
Our comprehensive evaluation on 12 sequencing-based DLPFC datasets and an independent imaging-based osmFISH dataset demonstrates substantial improvements in clustering accuracy. The gains range from 4% to 8% in high-quality data to remarkable 200%–400% improvements in severely degraded scenarios. The modular design enables seamless integration with existing spatial analysis methods including SpaceFlow, STAGATE, GraphST, and stCluster, making SpaLLM broadly applicable to diverse research scenarios regardless of the underlying experimental modality.
The success of SpaLLM demonstrates several key advantages: incorporating gene functional knowledge leads to more biologically meaningful clustering results, as evidenced by consistent improvements across all tested methods. Detailed ablation studies confirm that our weighted fusion strategy outperforms simple concatenation, and while newer models like text-embedding-3-large provide superior stability in extreme sparsity, text-embedding-ada-002 remains a highly efficient default for routine analysis. Functional features provide stable signals even when expression data is sparse or degraded, with the most dramatic improvements observed in Q3 and Q4 quality scenarios. Furthermore, our subregion analysis reveals that SpaLLM is particularly transformative for small-scale tissue samples (e.g., 1,000 cells), where the relative improvement in ARI reaches up to 27.3%, effectively compensating for limited spatial context.
While SpaLLM shows consistent effectiveness, its current implementation depends on the accuracy of large language model embeddings for capturing gene functional relationships. However, with the rapid advancement of language model architectures and the continuous expansion of biological knowledge databases, we anticipate that this limitation will be progressively overcome, leading to even more precise functional representations that better capture the complexity of biological systems.
This work establishes a foundation for knowledge-guided spatial omics analysis and demonstrates the potential for large language models to enhance biological discovery through the systematic integration of functional knowledge. The consistent improvements across diverse datasets, varying tissue sizes, and methods suggest that functional knowledge integration represents a promising paradigm for advancing spatial transcriptomics analysis.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.
Author contributions
ZZ: Data curation, Methodology, Conceptualization, Software, Investigation, Writing – review and editing, Formal Analysis, Resources, Visualization, Writing – original draft, Funding acquisition, Project administration. ZD: Writing – review and editing, Writing – original draft, Supervision.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Chen, Y., and Zou, J. (2024). Genept: a simple but effective foundation model for genes and cells built from chatgpt. bioRxiv, 10. doi:10.1101/2023.10.16.562533
Chen, A., Liao, S., Cheng, M., Ma, K., Wu, L., Lai, Y., et al. (2022). Spatiotemporal transcriptomic atlas of mouse organogenesis using dna nanoball-patterned arrays. Cell. 185, 1777–1792. doi:10.1016/j.cell.2022.04.003
Codeluppi, S., Borm, L. E., Zeisel, A., La Manno, G., van Lunteren, J. A., Svensson, C. I., et al. (2018). Spatial organization of the somatosensory cortex revealed by osmfish. Nat. Methods 15, 932–935. doi:10.1038/s41592-018-0175-z
Dong, K., and Zhang, S. (2022). Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739. doi:10.1038/s41467-022-29439-6
Duan, Z., Riffle, D., Li, R., Liu, J., Min, M. R., and Zhang, J. (2024a). Impeller: a path-based heterogeneous graph learning method for spatial transcriptomic data imputation. Bioinformatics 40, btae339. doi:10.1093/bioinformatics/btae339
Duan, Z., Xu, S., Lee, C., Riffle, D., and Zhang, J. (2024b). “Imiracle: an iterative multi-view graph neural network to model intercellular gene regulation from spatial transcriptomic data,” in Proceedings of the 33rd ACM international conference on information and knowledge management, 538–548.
Duan, Z., Li, X., Xiao, Z., Ying, R., and Zhang, J. (2025a). “Muse: a multi-slice joint analysis method for spatial transcriptomics experiments,” in Proceedings of the 34th ACM international conference on information and knowledge management (New York, NY, USA: Association for Computing Machinery, CIKM ’25), 625–634. doi:10.1145/3746252.3761240
Duan, Z., Li, X., Zhang, Z., Song, J., and Zhang, J. (2025b). “Disco: a diffusion model for spatial transcriptomics data completion,” in 2025 IEEE international conference on image processing (ICIP) (IEEE), 19–24.
Emani, P. S., Liu, J. J., Clarke, D., Jensen, M., Warrell, J., Gupta, C., et al. (2024). Single-cell genomics and regulatory networks for 388 human brains. Science 384, eadi5199. doi:10.1126/science.adi5199
Hu, J., Li, X., Coleman, K., Schroeder, A., Ma, N., Irwin, D. J., et al. (2021). Spagcn: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351. doi:10.1038/s41592-021-01255-8
Long, Y., Ang, K. S., Li, M., Chong, K. L. K., Sethi, R., Zhong, C., et al. (2023). Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with graphst. Nat. Commun. 14, 1155. doi:10.1038/s41467-023-36796-3
Maynard, K. R., Collado-Torres, L., Weber, L. M., Uytingco, C., Barry, B. K., Williams, S. R., et al. (2021). Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436. doi:10.1038/s41593-020-00787-0
Ren, H., Walker, B. L., Cang, Z., and Nie, Q. (2022). Identifying multicellular spatiotemporal organization of cells with spaceflow. Nat. Commun. 13, 4076. doi:10.1038/s41467-022-31739-w
Rodriques, S. G., Stickels, R. R., Goeva, A., Martin, C. A., Murray, E., Vanderburg, C. R., et al. (2019). Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467. doi:10.1126/science.aaw1219
Ruzicka, W. B., Mohammadi, S., Fullard, J. F., Davila-Velderrain, J., Subburaju, S., Tso, D. R., et al. (2024). Single-cell multi-cohort dissection of the schizophrenia transcriptome. Science 384, eadg5136. doi:10.1126/science.adg5136
Wang, T., Shu, H., Hu, J., Wang, Y., Chen, J., Peng, J., et al. (2024). Accurately deciphering spatial domains for spatially resolved transcriptomics with stcluster. Briefings Bioinforma. 25, bbae329. doi:10.1093/bib/bbae329
Keywords: graph neural networks, large language models, multimodality, spatial domain identification, spatial transcriptomics
Citation: Zou Z and Duan Z (2026) SpaLLM: a general framework for spatial domain identification with large language models. Front. Bioinform. 5:1713975. doi: 10.3389/fbinf.2025.1713975
Received: 26 September 2025; Accepted: 30 December 2025;
Published: 12 January 2026.
Edited by:
Martin Hemberg, Harvard Medical School, United StatesReviewed by:
Hyun Jung Park, University of Pittsburgh, United StatesAlexandro Trevino, Enable Medicine, United States
Copyright © 2026 Zou and Duan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Ziheng Duan, emloZW5kMUB1Y2kuZWR1
Zeyu Zou