<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Bioinformatics | Single Cell Bioinformatics section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/bioinformatics/sections/single-cell-bioinformatics</link>
        <description>RSS Feed for Single Cell Bioinformatics section in the Frontiers in Bioinformatics journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-05-13T06:16:02.95+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1767362</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1767362</link>
        <title><![CDATA[Machine learning approaches for biomarker discovery using single-cell RNA sequencing]]></title>
        <pubdate>2026-04-02T00:00:00Z</pubdate>
        <category>Review</category>
        <author>Gabriel Dewa</author><author>C. Mee Ling Munier</author><author>Sara Ballouz</author><author>Raymond Louie</author>
        <description><![CDATA[The application of single-cell RNA sequencing (scRNA-seq) for biomarker discovery promises unprecedented resolution in identifying potential biomarkers by capturing and analysing cellular heterogeneity. Traditionally, biomarker discovery efforts within single-cell transcriptomics have primarily relied on conventional statistical approaches, particularly through the application of differential gene expression analysis, to identify candidate biomarkers. However, in recent years, with the rapid advancement and growing popularity of artificial intelligence and machine learning, their application in scRNA-seq biomarker discovery has become increasingly prominent. Currently, machine learning-based approaches for scRNA-seq biomarker discovery exhibit considerable methodological diversity, which can be distinguished by factors such as the level of discovery, choice of supervised learning algorithm, feature selection methods, classification metrics, and downstream biological analyses. This review provides a comprehensive overview of the current landscape of machine learning methods for scRNA-seq biomarker discovery, offering researchers a complete and detailed understanding of the field.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1672671</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1672671</link>
        <title><![CDATA[Integrating trajectory inference and self-explainable predictive models to explore cell state transitions in breast cancer at single-cell resolution]]></title>
        <pubdate>2026-03-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Vanessa Verrina</author><author>Marianna Talia</author><author>Eugenio Cesario</author><author>Santina Capalbo</author><author>Domenica Scordamaglia</author><author>Rosamaria Lappano</author><author>Anna Maria Miglietta</author><author>Marcello Maggiolini</author><author>Sabrina Giordano</author>
        <description><![CDATA[IntroductionBreast cancer is characterized by a highly heterogeneous cellular environment composed of diverse malignant clones and components of the tumor microenvironment (TME) that collectively influence disease progression. Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect this complexity, enabling high-resolution characterization of tumor heterogeneity and functional interactions within the TME. Moreover, it supports the discovery of clinically relevant subpopulations and potential therapeutic targets.MethodsIn this study, we present a novel scRNA-seq dataset from an infiltrating ductal breast cancer, profiling over 5,000 cells and identifying six distinct clusters spanning cancer and TME populations. To explore the molecular drivers of cell state transitions, we integrate pseudotime trajectory inference with interpretable, tree-based machine learning. This combined approach enables the identification of key genes and expression thresholds associated with dynamic phenotypic shifts.ResultsOur analysis identified six distinct cellular clusters representing both malignant and TME populations. The integration of pseudotime inference with interpretable machine learning uncovered key genes and specific expression thresholds associated with transcriptional reprogramming and dynamic phenotypic transitions during tumor evolution.DiscussionUnlike black-box models, our framework provides transparent, rule-based insights into transcriptional reprogramming processes underlying tumor progression. The resulting dataset, together with an accessible and transparent analytical pipeline, represents a valuable resource for the breast cancer research community and establishes a foundation for future studies aimed at refining molecular classification and advancing precision therapy development.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1715821</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1715821</link>
        <title><![CDATA[Applications of AI to single-cell and spatial transcriptomics: current state-of-the-art and challenges]]></title>
        <pubdate>2026-01-27T00:00:00Z</pubdate>
        <category>Review</category>
        <author>Boris Tchatchoua Ngassam</author><author>Huilin Niu</author><author>Sunny Pang</author><author>Valeryia Shydlouskaya</author><author>Tallulah S. Andrews</author>
        <description><![CDATA[Artificial intelligence (AI) has become a common tool for bioinformatics, with hundreds of methods published in recent years. Due to the training data demands of deep-learning algorithms, high-throughput single-cell and spatial transcriptomics is one of the most popular areas for these applications. Here we review how AI is being used for single-cell and spatial transcriptomics analysis, and how these approaches compare to alternative statistical or heuristic-based methods. We explored 10 common analysis tasks: dimensionality reduction, cross-dataset integration, data denoising, data augmentation, deconvolution, cell-cell interactions, transcriptional velocity, transcriptomic-chromatin accessibility integration, and integrating single-cell and spatial transcriptomics modalities. We highlight which algorithms are likely to be useful for discovery researchers, and which are not yet ready for general research use.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1697212</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1697212</link>
        <title><![CDATA[High-dimensional co-expression network analysis reveals persistent TRH gene expression throughout axolotl telencephalon regeneration]]></title>
        <pubdate>2026-01-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Iveth Gómez-Morales</author><author>Adriana P. Mendizabal-Ruiz</author><author>J. Alejandro Morales</author><author>Teresa Romero-Gutiérrez</author>
        <description><![CDATA[IntroductionThe Axolotl (Ambystoma mexicanum) offers a deep insight into brain regeneration by fully reconstructing its telencephalon post-injury, a capability that most vertebrates do not have. This study aimed to identify hub genes (highest-weighted genes) underlying this process and to map their cell location by analyzing spatiotemporal transcriptomic data using high-dimensional weighted gene co-expression network analysis, integrating protein-protein interaction networks, and cross-validating findings through literature.ResultsWe identified 180 hub genes across the regeneration timeline, including several with conserved orthologs previously reported in vertebrate regeneration models. Among these candidates, TRH (Thyrotropin-Releasing Hormone) displayed the most consistent spatiotemporal pattern, appearing repeatedly as a hub gene and localizing to MSN enriched regions at multiple stages. TRH is broadly characterized in vertebrates as a neuroendocrine peptide with roles in hormonal signaling, and MSNs are known to respond to a variety of hormonal and neuropeptidergic cues. In our dataset, this background provides additional perspective on the transcriptional configurations in which TRH appears. Other hub genes showed stage/cell specific patterns, together outlining a heterogeneous and dynamic landscape of transcriptional states detected during telencephalon regeneration.ConclusionThis study provides a descriptive map of gene co-expression dynamics during axolotl telencephalon regeneration. By integrating hdWGCNA, spatial transcriptomics, and network-based context, we identify hub genes and transcriptional states associated with injury response, including a persistent TRH linked MSN state. These findings offer a foundation for future experimental studies aimed at elucidating the molecular basis of axolotl brain repair.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1740715</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1740715</link>
        <title><![CDATA[Integrative transcriptomic analysis reveals microglial metabolic-inflammatory crosstalk of HK2–HSPA5–TNF axis after intracerebral hemorrhage]]></title>
        <pubdate>2026-01-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Yi Zhang</author><author>Yongqian Liu</author><author>Wei Meng</author><author>Xiaobo Yu</author><author>Xiaojun Xu</author>
        <description><![CDATA[BackgroundIntracerebral hemorrhage (ICH) triggers secondary brain injury through neuroinflammation, yet the interplay between metabolic reprogramming and inflammatory responses remains poorly defined. This study investigated how glucose metabolism dysregulation contributes to neuroinflammatory pathogenesis following ICH.MethodsWe integrated transcriptomic datasets from bulk RNA sequencing (human perihematomal tissue), single-cell RNA sequencing (mouse ICH model), and spatial transcriptomics (mouse time-series). Bioinformatic analyses included differential expression screening, single-cell weighted gene co-expression network analysis, pseudotemporal trajectory reconstruction, and cell-cell communication inference to identify key metabolic-inflammation regulators and their spatiotemporal dynamics.ResultsMulti-omics convergence revealed hexokinase 2 (HK2), heat shock protein A5 (HSPA5), and tumor necrosis factor (TNF) as core regulators linking glucose metabolism to neuroinflammation. Single-cell analysis showed significant time-dependent regulation of HK2 in microglia, while spatial transcriptomics uncovered synchronized alterations of HK2, HSPA5, and TNF in perihematomal regions at day 7. Cell communication analysis highlighted enhanced microglia-to-neutrophil signaling via Tnf-Tnfrsf1b pairs, with TNF signaling identified as the most significantly upregulated pathway in ICH conditions.ConclusionOur multi-omics approach reveals coordinated dysregulation of glucose metabolism and inflammatory genes following ICH, with time-dependent HK2 regulation in microglia and synchronized transcriptional changes at day 7 representing critical events in neuroinflammatory progression. The identified gene networks and cellular communication patterns provide new insights into the metabolic-immune interface in ICH, offering potential targets for future therapeutic strategies.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1713975</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1713975</link>
        <title><![CDATA[SpaLLM: a general framework for spatial domain identification with large language models]]></title>
        <pubdate>2026-01-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Zeyu Zou</author><author>Ziheng Duan</author>
        <description><![CDATA[Spatial transcriptomics (ST) technologies enable the profiling of gene expression while preserving spatial context, offering unprecedented insights into tissue organization. However, traditional spatial domain identification methods primarily rely on gene expression matrices and spatial coordinates while overlooking the rich biological knowledge encoded in gene functional descriptions. Here, we propose SpaLLM, a general framework that integrates large language model (LLM) embeddings of gene descriptions with conventional spatial transcriptomics analysis. Our approach leverages pre-computed GenePT embeddings from NCBI gene summaries to create biologically-informed gene representations. SpaLLM combines these LLM-derived gene features with cell-gene expression matrices through matrix multiplication, generating enriched cell representations that capture both expression patterns and functional knowledge. These enriched features are then integrated with existing graph-based spatial analysis methods for improved spatial domain identification. Extensive validation on 12 sequencing-based Visium sections and an independent imaging-based osmFISH dataset demonstrate that SpaLLM consistently enhances spatial domain identification. Our modular framework can be seamlessly integrated with existing spatial analysis pipelines, making it broadly applicable to diverse research scenarios.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1684227</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1684227</link>
        <title><![CDATA[Celline: a flexible tool for one-step retrieval and integrative analysis of public single-cell RNA sequencing data]]></title>
        <pubdate>2025-12-11T00:00:00Z</pubdate>
        <category>Technology and Code</category>
        <author>Yuya Sato</author><author>Toru Asahi</author><author>Kosuke Kataoka</author>
        <description><![CDATA[Single-cell RNA sequencing (scRNA-seq) has generated a rapidly expanding collection of public datasets that provide insight into development, disease, and therapy. However, researchers lack an end-to-end solution for seamlessly retrieving, preprocessing, integrating, and analyzing these data because existing tools address only isolated steps and require manual curation of accessions, metadata, and technical variability, known as batch effects. In this study, we developed Celline, a Python package that executes an entire workflow using a single-line commands per step. Celline automatically gathers raw single-cell RNA-seq data from multiple public repositories and extracts metadata using large language models. It then wraps established tools, including Scrublet for doublet removal, Seurat and Scanpy for quality control and cell-type annotation, Harmony and scVI for batch correction, and Slingshot for trajectory inference, into one-line commands, enabling seamless integrative analyses. To validate Celline-acquired data quality and the integrated framework’s practical utility, we applied it to 2 mouse brain cortex datasets from embryonic days 14.5 and 18. Technical validation demonstrated that Celline successfully retrieved data, standardized metadata, and enabled standard analyses that removed low-quality cells, annotated 11 major cell types, improved integration quality (scIB score +0.22), and completed trajectory analysis. Thus, Celline transforms scattered public scRNA-seq resources into unified, analysis-ready datasets with minimal effort. Its modular design allows pipeline extension, encourages community-driven advances, and accelerates the discovery of single-cell data.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1636240</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1636240</link>
        <title><![CDATA[Comprehensive analysis of multi-omics vaccine response data using MOFA and Stabl algorithms]]></title>
        <pubdate>2025-11-13T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Aanya Gupta</author><author>Koji Abe</author><author>Holden T. Maecker</author>
        <description><![CDATA[IntroductionFluPRINT is a multi-omics dataset that measures donors’ protein expression and cell counts across various assays. Donors were also assigned a binary value (0 or 1), being labeled as high responders if they had a fold change ≥4 of the antibody titer for hemagglutination inhibition (HAI) from day 0 to day 28, and low responders otherwise (0). In this project, we used the MOFA and Stabl algorithms to analyze FluPRINT, estimate the population structure from the data, and identify the most important features for predicting response to the vaccine.MethodsThe preprocessing of the dataset included removing repeat features, scaling by assay, and removing outliers. Since Stabl does not directly address missing values, features with high amounts of missing values were removed and the remaining were ignored.ResultsMOFA identified the top feature in structure extraction as IL neg 2 CD4 pos CD45Ra neg pSTAT5. MOFA explains well the variance of the data while also choosing features that have good significance, as illustrated by their significant p-values (p < 0.05). Stabl found the top feature for explaining the outcome to be CD33− CD3+ CD4+ CD25hiCD127low CD161+ CD45RA + Tregs, which matched the top result of previously published analysis. MOFA’s features achieved an AUROC of 0.616 (95% CI of 0.426–0.806), and Stabl’s achieved an AUROC of 0.634 (95% CI of 0.432–0.823).DiscussionOur research addresses a key knowledge gap: understanding how these fundamentally different analytical approaches perform when analyzing the same complex dataset. Our exploration evaluates their respective strengths, limitations, and biological insights and provides guidance on using MOFA and Stabl to find the best predictive cell subsets and features for understanding large immunological multi-omics data. The code for this project can be found at https://github.com/aanya21gupta/fluprint.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1630161</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1630161</link>
        <title><![CDATA[Quantitative measures to assess the quality of cellular indexing of transcriptomes and epitopes by sequencing data]]></title>
        <pubdate>2025-09-18T00:00:00Z</pubdate>
        <category>Methods</category>
        <author>Jie Sun</author><author>Robert Morrison</author><author>Soyeon Kim</author><author>Kairuo Yan</author><author>Hyun Jung Park</author>
        <description><![CDATA[BackgroundCellular indexing of transcriptomes and epitopes by sequencing (CITE-Seq) is a powerful technique to simultaneously measure gene expression and cell surface protein abundances in individual cells. To obtain accurate and reliable biological findings from CITE-Seq data, it is critical to ensure rigorous quality control (QC). However, no public method has yet been developed for CITE-Seq QC.ResultsIn this study, we propose the first software package for multi-layered, systemic, and quantitative quality control (CITESeQC). Recognizing the multi-layered nature of CITE-Seq data, CITESeQC performs QC across gene expressions, surface proteins, and their interactions. It systemically evaluates all genes and protein markers assayed in the data and filters out some of them based on individual quality measures. Furthermore, for quantitative QC that enables objective and standardized analyses, CITESeQC quantifies cell type-specific expression of genes and surface proteins using Shannon entropy and correlation-based measures. Finally, to ensure broad applicability, CITESeQC guides users through a simple process that generates a complete markdown report with supporting figures and explanations, requiring minimal user intervention.ConclusionBy quantifying the quality of CITE-Seq data, CITESeQC enables precise characterization of gene expression within cell types and reliable classification of cell types using surface protein markers, thereby enhancing its value for clinical applications.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1641491</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1641491</link>
        <title><![CDATA[TCRscape: a single-cell multi-omic TCR profiling toolkit]]></title>
        <pubdate>2025-09-05T00:00:00Z</pubdate>
        <category>Technology and Code</category>
        <author>Roman Perik-Zavodskii</author><author>Olga Perik-Zavodskaia</author><author>Marina Volynets</author><author>Saleh Alrhmoun</author><author>Sergey Sennikov</author>
        <description><![CDATA[IntroductionSingle-cell multi-omics has transformed T-cell biology by enabling the simultaneous analysis of T-cell receptor (TCR) sequences, transcriptomes, and surface proteins at the resolution of individual cells. These capabilities are critical for identifying antigen-specific T-cells and accelerating the development of TCR-based immunotherapies.MethodsHere, we introduce TCRscape, an open-source Python 3 tool designed for high-resolution T-cell receptor clonotype discovery and quantification, optimized for BD Rhapsody™ single-cell multi-omics data.ResultsTCRscape integrates full-length TCR sequence data with gene expression profiles and surface protein expression to enable multimodal clustering of αβ and γδ T-cell populations. It also outputs Seurat-compatible matrices, facilitating downstream visualization and analysis in standard single-cell analysis environments.DiscussionBy bridging clonotype detection with immune cell transcriptome, proteome, and antigen specificity profiling, TCRscape supports rapid identification of dominant T-cell clones and their functional phenotypes, offering a powerful resource for immune monitoring and TCR-engineered therapeutic development. TCRscape can be found at https://github.com/Perik-Zavodskii/TCRscape/.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1626153</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1626153</link>
        <title><![CDATA[Analysis of histone modifications in key cellular subpopulations in the context of azoospermia using spermatogenic single-cell RNA-seq data]]></title>
        <pubdate>2025-07-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Qiu Wang</author><author>Hong Yang</author><author>Fang Li</author><author>Song Ge</author><author>Ling Ji</author><author>Xiaofeng Li</author>
        <description><![CDATA[IntroductionThe molecular underpinnings of non-obstructive azoospermia (NOA), a severe form of male infertility characterized by the absence of sperm in the ejaculate, remain unclear.MethodsIn this study, we demonstrate the role of histone modifications within specific testicular cell subpopulations in NOA using single-cell RNA sequencing (scRNA-seq) data.ResultsBased on scRNA-seq analysis of the data acquired from the Gene Expression Omnibus (GSE149512), we identified nine distinct cell types and revealed significant compositional differences between the NOA and control testicular tissues. In contrast to the high prevalence of spermatogenic cells in the controls, endothelial, testicular interstitial, and vascular smooth muscle cells, as well as macrophages, were enriched in NOA. Furthermore, our analyses revealed considerable enrichment of histone modificationrelated genes in Leydig cells, peritubular myoid (PTM) cells, and macrophages in the NOA group. HDAC2, a pivotal regulator of histone acetylation, exhibited significant upregulation. Functional pathway analysis implicated these genes in critical biological processes, including nuclear transport, RNA splicing, and autophagy. We quantified the activity of histone modificationrelated genes using AUCell and identified distinct Leydig cell subpopulations characterized by unique marker genes and functional pathways, underscoring their dual roles in histone modification and spermatogenesis. Additionally, cellular communication analysis via CellChat demonstrated altered interaction dynamics across cell types in NOA, particularly in Leydig and PTM cells, which exhibited enhanced interactions alongside differential activation of the WNT and NOTCH signaling pathways.DiscussionThese findings suggest that aberrant histone modifications in specific cellular subpopulations may drive disease progression, highlighting potential targets for diagnostic and therapeutic strategies. This study offers novel insights into the molecular mechanisms of NOA and provides a basis for future research on advanced male reproductive health.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1562410</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1562410</link>
        <title><![CDATA[Optimization of clustering parameters for single-cell RNA analysis using intrinsic goodness metrics]]></title>
        <pubdate>2025-06-11T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Nicolina Sciaraffa</author><author>Antonino Gagliano</author><author>Luigi Augugliaro</author><author>Claudia Coronnello</author>
        <description><![CDATA[IntroductionThe accurate clustering of cell subpopulations is a crucial aspect of single-cell RNA sequencing. The ability to correctly subdivide cell subpopulations hinges on the efficacy of unsupervised clustering. Despite the advancements and numerous adaptations of clustering algorithms, the correct clustering of cells remains a challenging endeavor that is dependent on the data in question and on the parameters selected for the clustering process. In this context, the present study aimed to predict the accuracy of clustering methods when varying different parameters by exploiting the intrinsic goodness metrics.MethodsThis study utilized three datasets, each originating from a distinct anatomical district and with a ground truth cell annotation. Moreover, the investigation employed two clustering methods: the Leiden and the Deep Embedding for Single-cell Clustering (DESC) algorithm. Firstly, a robust linear mixed regression model has been implemented in order to analyze the impact of clustering parameters on the accuracy. Consequently, fifteen intrinsic measures have been calculated and used to train an ElasticNet regression model in both intra- and cross-dataset approaches to evaluate the possibility of predicting the clustering accuracy.Results and discussionThe first-order interactions demonstrated that the use of the UMAP method for the generation of the neighborhood graph and an increase in resolution has a beneficial impact on accuracy. The impact of the resolution parameter is accentuated by the reduced number of nearest neighbors, resulting in sparser and more locally sensitive graphs, which better preserve fine-grained cellular relationships. Furthermore, it is advisable to test different numbers of principal components, given that this parameter is highly affected by data complexity. This procedure has enabled the effective prediction of clustering accuracy through the utilization of intrinsic metrics. The findings demonstrated that the within-cluster dispersion and the Banfield-Raftery index could be effectively used as proxies for accuracy, for an immediate comparison of different clustering parameter configurations.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1554010</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1554010</link>
        <title><![CDATA[Primer on machine learning applications in brain immunology]]></title>
        <pubdate>2025-04-17T00:00:00Z</pubdate>
        <category>Mini Review</category>
        <author>Niklas Binder</author><author>Ashkan Khavaran</author><author>Roman Sankowski</author>
        <description><![CDATA[Single-cell and spatial technologies have transformed our understanding of brain immunology, providing unprecedented insights into immune cell heterogeneity and spatial organisation within the central nervous system. These methods have uncovered complex cellular interactions, rare cell populations, and the dynamic immune landscape in neurological disorders. This review highlights recent advances in single-cell “omics” data analysis and discusses their applicability for brain immunology. Traditional statistical techniques, adapted for single-cell omics, have been crucial in categorizing cell types and identifying gene signatures, overcoming challenges posed by increasingly complex datasets. We explore how machine learning, particularly deep learning methods like autoencoders and graph neural networks, is addressing these challenges by enhancing dimensionality reduction, data integration, and feature extraction. Newly developed foundation models present exciting opportunities for uncovering gene expression programs and predicting genetic perturbations. Focusing on brain development, we demonstrate how single-cell analyses have resolved immune cell heterogeneity, identified temporal maturation trajectories, and uncovered potential therapeutic links to various pathologies, including brain malignancies and neurodegeneration. The integration of single-cell and spatial omics has elucidated the intricate cellular interplay within the developing brain. This mini-review is intended for wet lab biologists at all career stages, offering a concise overview of the evolving landscape of single-cell omics in the age of widely available artificial intelligence.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1519468</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1519468</link>
        <title><![CDATA[Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation]]></title>
        <pubdate>2025-02-12T00:00:00Z</pubdate>
        <category>Technology and Code</category>
        <author>Mikhail Arbatsky</author><author>Ekaterina Vasilyeva</author><author>Veronika Sysoeva</author><author>Ekaterina Semina</author><author>Valeri Saveliev</author><author>Kseniya Rubina</author>
        <description><![CDATA[Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1499514</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1499514</link>
        <title><![CDATA[Editorial: Women in bioinformatics]]></title>
        <pubdate>2024-10-08T00:00:00Z</pubdate>
        <category>Editorial</category>
        <author>Irma Martínez-Flores</author><author>Constanza Cárdenas Carvajal</author><author>Viviana Monje-Galvan</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1417428</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1417428</link>
        <title><![CDATA[A systematic overview of single-cell transcriptomics databases, their use cases, and limitations]]></title>
        <pubdate>2024-07-08T00:00:00Z</pubdate>
        <category>Mini Review</category>
        <author>Mahnoor N. Gondal</author><author>Saad Ur Rehman Shah</author><author>Arul M. Chinnaiyan</author><author>Marcin Cieslik</author>
        <description><![CDATA[Rapid advancements in high-throughput single-cell RNA-seq (scRNA-seq) technologies and experimental protocols have led to the generation of vast amounts of transcriptomic data that populates several online databases and repositories. Here, we systematically examined large-scale scRNA-seq databases, categorizing them based on their scope and purpose such as general, tissue-specific databases, disease-specific databases, cancer-focused databases, and cell type-focused databases. Next, we discuss the technical and methodological challenges associated with curating large-scale scRNA-seq databases, along with current computational solutions. We argue that understanding scRNA-seq databases, including their limitations and assumptions, is crucial for effectively utilizing this data to make robust discoveries and identify novel biological insights. Such platforms can help bridge the gap between computational and wet lab scientists through user-friendly web-based interfaces needed for democratizing access to single-cell data. These platforms would facilitate interdisciplinary research, enabling researchers from various disciplines to collaborate effectively. This review underscores the importance of leveraging computational approaches to unravel the complexities of single-cell data and offers a promising direction for future research in the field.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1347276</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1347276</link>
        <title><![CDATA[Predicting cell population-specific gene expression from genomic sequence]]></title>
        <pubdate>2024-03-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Lieke Michielsen</author><author>Marcel J. T. Reinders</author><author>Ahmed Mahfouz</author>
        <description><![CDATA[Most regulatory elements, especially enhancer sequences, are cell population-specific. One could even argue that a distinct set of regulatory elements is what defines a cell population. However, discovering which non-coding regions of the DNA are essential in which context, and as a result, which genes are expressed, is a difficult task. Some computational models tackle this problem by predicting gene expression directly from the genomic sequence. These models are currently limited to predicting bulk measurements and mainly make tissue-specific predictions. Here, we present a model that leverages single-cell RNA-sequencing data to predict gene expression. We show that cell population-specific models outperform tissue-specific models, especially when the expression profile of a cell population and the corresponding tissue are dissimilar. Further, we show that our model can prioritize GWAS variants and learn motifs of transcription factor binding sites. We envision that our model can be useful for delineating cell population-specific regulatory elements.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1340339</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1340339</link>
        <title><![CDATA[Detecting subtle transcriptomic perturbations induced by lncRNAs knock-down in single-cell CRISPRi screening using a new sparse supervised autoencoder neural network]]></title>
        <pubdate>2024-03-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Marin Truchi</author><author>Caroline Lacoux</author><author>Cyprien Gille</author><author>Julien Fassy</author><author>Virginie Magnone</author><author>Rafael Lopes Goncalves</author><author>Cédric Girard-Riboulleau</author><author>Iris Manosalva-Pena</author><author>Marine Gautier-Isola</author><author>Kevin Lebrigand</author><author>Pascal Barbry</author><author>Salvatore Spicuglia</author><author>Georges Vassaux</author><author>Roger Rezzonico</author><author>Michel Barlaud</author><author>Bernard Mari</author>
        <description><![CDATA[Single-cell CRISPR-based transcriptome screens are potent genetic tools for concomitantly assessing the expression profiles of cells targeted by a set of guides RNA (gRNA), and inferring target gene functions from the observed perturbations. However, due to various limitations, this approach lacks sensitivity in detecting weak perturbations and is essentially reliable when studying master regulators such as transcription factors. To overcome the challenge of detecting subtle gRNA induced transcriptomic perturbations and classifying the most responsive cells, we developed a new supervised autoencoder neural network method. Our Sparse supervised autoencoder (SSAE) neural network provides selection of both relevant features (genes) and actual perturbed cells. We applied this method on an in-house single-cell CRISPR-interference-based (CRISPRi) transcriptome screening (CROP-Seq) focusing on a subset of long non-coding RNAs (lncRNAs) regulated by hypoxia, a condition that promote tumor aggressiveness and drug resistance, in the context of lung adenocarcinoma (LUAD). The CROP-seq library of validated gRNA against a subset of lncRNAs and, as positive controls, HIF1A and HIF2A, the 2 main transcription factors of the hypoxic response, was transduced in A549 LUAD cells cultured in normoxia or exposed to hypoxic conditions during 3, 6 or 24 h. We first validated the SSAE approach on HIF1A and HIF2 by confirming the specific effect of their knock-down during the temporal switch of the hypoxic response. Next, the SSAE method was able to detect stable short hypoxia-dependent transcriptomic signatures induced by the knock-down of some lncRNAs candidates, outperforming previously published machine learning approaches. This proof of concept demonstrates the relevance of the SSAE approach for deciphering weak perturbations in single-cell transcriptomic data readout as part of CRISPR-based screening.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2023.1211819</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2023.1211819</link>
        <title><![CDATA[Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets]]></title>
        <pubdate>2023-08-10T00:00:00Z</pubdate>
        <category>Methods</category>
        <author>Wanxin Li</author><author>Jules Mirone</author><author>Ashok Prasad</author><author>Nina Miolane</author><author>Carine Legrand</author><author>Khanh Dao Duc</author>
        <description><![CDATA[Conventional dimensionality reduction methods like Multidimensional Scaling (MDS) are sensitive to the presence of orthogonal outliers, leading to significant defects in the embedding. We introduce a robust MDS method, called DeCOr-MDS (Detection and Correction of Orthogonal outliers using MDS), based on the geometry and statistics of simplices formed by data points, that allows to detect orthogonal outliers and subsequently reduce dimensionality. We validate our methods using synthetic datasets, and further show how it can be applied to a variety of large real biological datasets, including cancer image cell data, human microbiome project data and single cell RNA sequencing data, to address the task of data cleaning and visualization.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2023.1120290</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2023.1120290</link>
        <title><![CDATA[Gene representation in scRNA-seq is correlated with common motifs at the 3′ end of transcripts]]></title>
        <pubdate>2023-05-15T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Xinling Li</author><author>Greg Gibson</author><author>Peng Qiu</author>
        <description><![CDATA[One important characteristic of single-cell RNA sequencing (scRNA-seq) data is its high sparsity, where the gene-cell count data matrix contains high proportion of zeros. The sparsity has motivated widespread discussions on dropouts and missing data, as well as imputation algorithms of scRNA-seq analysis. Here, we aim to investigate whether there exist genes that are more prone to be under-detected in scRNA-seq, and if yes, what commonalities those genes may share. From public data sources, we gathered paired bulk RNA-seq and scRNA-seq data from 53 human samples, which were generated in diverse biological contexts. We derived pseudo-bulk gene expression by averaging the scRNA-seq data across cells. Comparisons of the paired bulk and pseudo-bulk gene expression profiles revealed that there indeed exists a collection of genes that are frequently under-detected in scRNA-seq compared to bulk RNA-seq. This result was robust to randomization when unpaired bulk and pseudo-bulk gene expression profiles were compared. We performed motif search to the last 350 bp of the identified genes, and observed an enrichment of poly(T) motif. The poly(T) motif toward the tails of those genes may be able to form hairpin structures with the poly(A) tails of their mRNA transcripts, making it difficult for their mRNA transcripts to be captured during scRNA-seq library preparation, which is a mechanistic conjecture of why certain genes may be more prone to be under-detected in scRNA-seq.]]></description>
      </item>
      </channel>
    </rss>