<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Bioinformatics | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/bioinformatics</link>
        <description>RSS Feed for Frontiers in Bioinformatics | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-05-04T21:30:52.266+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1792877</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1792877</link>
        <title><![CDATA[Transcriptomics-guided discovery of Interleukin-6 modulators from Bacillus subtilis metabolites in type 2 diabetes mellitus]]></title>
        <pubdate>2026-05-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Tarsha Muthukumar</author><author>Sidharth Kumar Nanda Kumar</author><author>Vasundra Vasudevan</author><author>Madhana Priya Nanda Kumar</author><author>Thirumal Kumar D.</author><author>Amudha Govindarajan</author><author>Magesh Ramasamy</author>
        <description><![CDATA[Type 2 diabetes mellitus (T2DM) is characterized by chronic metabolic dysfunction and low-grade inflammation. This study aimed to identify inflammation-associated molecular targets in T2DM and to computationally evaluate the therapeutic potential of Bacillus subtilis–derived metabolites targeting the key inflammatory cytokine IL6. Publicly available human RNA-sequencing datasets were retrieved from the NCBI Gene Expression Omnibus and analyzed using GEO2R to compare lean, obese, and T2DM groups. Common differentially expressed genes (DEGs) were identified and functionally enriched, with IL6 prioritized as a central inflammatory target. The IL6 protein structure was prepared for structure-based screening. Fifty-five B. subtilis metabolites were screened using PyRx, followed by ADME and toxicity prediction. Top-ranked compounds were further evaluated using molecular docking and 500 ns molecular dynamics simulations, with metformin as a reference. Free energy landscape (FEL) and dynamic cross-correlation matrix (DCCM) analyses assessed ligand-induced conformational stability and internal protein dynamics. A total of 179 common DEGs were identified, enriched in cytokine-mediated inflammatory pathways out of which IL6 emerged as a consistently upregulated hub gene. Three metabolites showed favorable pharmacokinetics, low predicted toxicity, and stronger binding affinities to IL6 than metformin. Docking revealed stable interactions with key IL6 residues, while molecular dynamics confirmed sustained complex stability. FEL and dynamic cross-correlation matrix analyses showed ligand-dependent differences in conformational stability while preserving internal dynamics. This integrative transcriptomics and structure-based analysis highlights B. subtilis metabolites as computationally predicted IL6-binding compounds involved in T2DM-associated inflammation, identifying them as promising candidates for further investigation towards potential healthcare and therapeutic applications.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1803111</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1803111</link>
        <title><![CDATA[Delineating novel diagnostic biomarkers and therapeutic targets for oral submucosal fibrosis: an integrative multi-omics and machine learning approach]]></title>
        <pubdate>2026-05-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Chinmay Nitin Mokal</author><author>Piyush Agrawal</author>
        <description><![CDATA[BackgroundOral submucosal fibrosis (OSF) is a chronic and progressive disorder, caused by chewing areca nuts, affecting the oral cavity and oropharynx. OSF is characterized by severe symptoms like severe burning sensation, restricted mouth opening, etc. Given the multifactorial and poorly understood molecular basis of the disease, there is a need for novel biomarkers and therapeutic targets.MethodWe downloaded 3 RNA-seq, two microarray, one epigenomic, and one single-cell RNA-seq datasets from the gene expression omnibus database. Differentially expressed genes (DEGs) were characterized using DESeq2. Several analyses, including gene enrichment, immune cell infiltration, protein-protein interaction, and more, were performed. Machine learning models were developed using all DEGs and top5 selected features with leave one out cross validation technique. Independent validations were performed using two microarray datasets with appropriate statistical measures. Epigenetic analysis revealed hyper- and hypomethylated genes based on delta-beta values, and an integrative analysis of the transcriptome and methylome was performed to obtain key biomarkers. Single-cell analysis was performed to identify key cell types showing higher DEG expression.ResultDESeq2 analysis identified 29 upregulated and 15 downregulated DEGs. Upregulated DEGs show enrichment for the inflammatory, metabolic, and signaling processes, whereas downregulated DEGs were largely associated with metabolic processes. Immune cell enrichment analysis using CIBERSORTx shows higher enrichment of “T cells,” “mast cells,” and “macrophages” in OSF patients. We validated our findings in two independent microarray datasets and observed a similar gene expression pattern of DEGs. Machine learning performed using top5 features where Random Forest model achieved AUROC of 0.99 and AUPRC of 0.99. Further, ROC analysis and AUC plot show that DEGs can discriminate OSF patients from the normal population with high AUROC. Integrative analysis of methylation and transcriptomic data identified 11 genes as potential diagnostic biomarkers and therapeutic targets. Finally, single-cell analysis elucidates the higher expression of DEGs in “keratinocyte”, “epithelial cells” and “dendritic cells”.ConclusionIntegrative analysis identified 11 gene signatures as potential early diagnostic biomarkers and therapeutic targets for the OSF. Furthermore, the study hints towards mechanistic insight into potential mechanism leading to oral cancer. All the codes and ML models are provided at our GitHub repository https://github.com/agrawalpiyush-srm/OSF.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1811161</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1811161</link>
        <title><![CDATA[Discovering TEAD4 modulators for hepatocellular carcinoma: a GAN-enabled generative modelling framework]]></title>
        <pubdate>2026-05-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Varshni Premnath</author><author>Ramanathan Karuppasamy</author><author>Jayakumar Kaliappan</author><author>Shanthi Veerappapillai</author>
        <description><![CDATA[IntroductionLiver diseases continue to impose a major global health burden, and therapeutic progress is constrained by the limited availability of validated small-molecule modulators. TEAD4, a central Hippo-YAP effector, has emerged as a key regulator of hepatic regeneration, survival, and disease progression, yet remains pharmacologically underexplored due to the scarcity of experimentally confirmed inhibitors. Critically, the limited number of known active compounds restricts effective supervised learning, necessitating data augmentation strategies capable of expanding TEAD4 relevant chemical space.MethodsTo address this, we developed an integrative computational framework in which a conditional generative adversarial network was trained on QikProp-derived molecular descriptors to generate chemically realistic synthetic samples and mitigate class imbalance. This GAN driven augmentation enabled construction of a robust activity prediction model. XGBoost was selected as the classifier due to its strong performance on structured descriptor datasets and its ability to capture complex nonlinear relationships with strong generalization. The augmented dataset was used to train the XGBoost classifier for activity prediction and screen DrugBank compounds, producing a focused set of high confidence candidates. Shortlisted hits were refined using structure-based evaluation, toxicity filtering, and anticancer sensitivity prediction.ResultsQuantum chemical analysis identified DB00169 (cholecalciferol) as a potential TEAD4-binding candidate supported by combined structural, dynamic, and electronic analyses. Molecular dynamics simulations further supported the stability of the TEAD4–ligand complex, indicating compact structural behaviour and thermodynamically favourable conformational states.DiscussionOverall, this work demonstrates that coupling GAN based molecular augmentation with XGBoost classification and molecular simulations provides a scalable strategy for identifying biologically meaningful TEAD4 modulators, supporting TEAD4 targeted drug discovery across liver diseases.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1811916</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1811916</link>
        <title><![CDATA[On the optimization of copy number variations representation in pangenome graphs]]></title>
        <pubdate>2026-05-01T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Mirko Coggi</author><author>Lorenzo Basile</author><author>Beatrice Branchini</author><author>Gabriele Amodeo</author><author>Guido Walter Di Donato</author><author>Marco D. Santambrogio</author>
        <description><![CDATA[Graph-based pangenome references often misrepresent Copy Number Variations (CNVs) and Variable Number Tandem Repeats (VNTRs) as alternative acyclic paths, which hinders downstream analyses, degrades alignment behavior, and reduces interpretability in graph visualizations. For these reasons, we introduce PANPHORTE, a topology-optimization methodology that detects repeat-driven misrepresentations within superbubbles and rewrites them into structures that more faithfully reflect the underlying biology. Given a pangenome graph annotated with haplotype paths, PANPHORTE identifies repetitive elements inside superbubbles, isolates shared repeat sequences across distinct subpaths, and refactors the graph by splitting nodes and introducing explicit cycles, encoding CNVs and VNTRs without loss of information. We provide a C++ command-line implementation of the proposed specifications, and a complementary pipeline that applies PANPHORTE followed by GFAffix to further reduce redundancy in regions not affected by repeat-induced artifacts. We evaluate PANPHORTE on synthetic and real pangenome graphs, showing reductions in memory footprint of up to 71.69%, improvements in exact read matches of up to 34.4%, and substantially clearer visual identification of repeated loci.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1713736</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1713736</link>
        <title><![CDATA[Automated segmentation of hepatic vessels and lobules in whole-slide images using U-net models]]></title>
        <pubdate>2026-04-30T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Mehul Bafna</author><author>Matthias König</author><author>Sylvia Saalfeld</author><author>Vladimira Moulisova</author><author>Vaclav Liska</author><author>Uta Dahmen</author><author>Mohamed Albadry</author>
        <description><![CDATA[Automated analysis of hepatic vascular structures and lobules within whole-slide histological images is critical for ensuring accurate and timely morphometric evaluations and facilitating advancements in computational liver histology. Nonetheless, the intricate morphology of the tissue, variability in staining techniques, and the requirements for standard high-resolution images present substantial challenges to the precision of segmentation processes. We present a robust deep-learning pipeline using adaptive patch extraction and specialized nnU-Net architectures for segmenting vessels, bile ducts, and lobules in Glutamine Synthetase and Picro-Sirius-Red stained porcine liver sections. Our architecture incorporates a weight-boosted nnU-Net framework with an adaptive, performance-based weight adjustment mechanism to effectively manage class imbalances and improve the detection of smaller vascular structures. The model was trained on four annotated whole-slide images and validated through comprehensive testing on eight additional independent slides. Geometric and intensity-based data transformations enhanced the robustness and generalizability of the segmentation models. Evaluations conducted through five-fold cross-validation, as well as assessments utilizing independent test datasets, resulted in Dice similarity scores: 0.968 for lobules, 0.795 for central veins, 0.895 for hepatic arteries, 0.665 for portal veins, and 0.694 for bile ducts. The developed segmentation pipeline additionally supports comprehensive morphometric analyses of structural parameters, including number and size (diameter, area) of vascular structures, bile ducts, and lobules; for example, the diameter of hepatic arteries ranges between 20–90 µm. These findings underscore the practical relevance of adaptable segmentation frameworks in advancing computational histological analysis of liver tissue.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1764859</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1764859</link>
        <title><![CDATA[QSAR and scaffold-based optimization of HMGR inhibitors using cheminformatics and machine learning]]></title>
        <pubdate>2026-04-30T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Priya Antony</author><author>Bincy Baby</author><author>Ranjit Vijayan</author>
        <description><![CDATA[Atherosclerosis, driven by elevated cholesterol levels, remains a major risk factor for cardiovascular disease. 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGR), the rate-limiting enzyme involved in cholesterol biosynthesis, represents a validated therapeutic target. Statins are an effective class of drugs widely prescribed for HMGR inhibition; however, their prolonged use causes adverse side effects. This highlights the need for novel inhibitors with improved safety and efficacy. In this study, a comprehensive cheminformatics and machine learning approach was applied to identify and optimize potential HMGR inhibitors. A curated dataset from the ChEMBL database was analyzed through physicochemical descriptor profiling, exploratory data analysis, and principal component analysis (PCA). Murcko scaffold extraction revealed that active molecules clustered around complex cyclic frameworks enriched in aromatic and nitrogen-containing motifs. Following this, quantitative structure–activity relationship (QSAR) models were developed using various machine learning algorithms, and it was found that gradient boosting and XGBoost regressors demonstrated the best performance, with a tuned XGBoost achieving a cross-validated R2 of 0.70. Ligand-based R group enumeration further refined promising cores, enhancing hydrogen bonding, polarity, and multiparameter optimization (MPO) scores. Four scaffolds were successfully optimized, with improved MPO values. Thus, by integrating cheminformatics and machine learning, this study provides a systematic pipeline that highlights promising scaffolds optimizing drug-likeness for the development of novel HMGR inhibitors.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1841924</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1841924</link>
        <title><![CDATA[Correction: An integrated automated deep learning framework for annotating tumor-infiltrating lymphocytes in lung adenocarcinoma pathology]]></title>
        <pubdate>2026-04-30T00:00:00Z</pubdate>
        <category>Correction</category>
        <author>Xia Li</author><author>Kang-Lai Wei</author><author>Zhao-Quan Huang</author><author>Zi-Yan Huang</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1784287</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1784287</link>
        <title><![CDATA[Comprehensive genomics, systems, and structural assessment for novel target identification in penicillin-resistant Streptococcus pneumoniae]]></title>
        <pubdate>2026-04-29T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Avani Panickar</author><author>Suvitha Anbarasu</author><author>Anand Manoharan</author><author>Sudha Ramaiah</author><author>Anand Anbarasu</author>
        <description><![CDATA[IntroductionThe increasing prevalence of penicillin-resistant Streptococcus pneumoniae (PRSP) has compromised the efficacy of conventional β-lactam therapies, and the inefficiency of penicillin-binding proteins (PBPs) as reliable drug targets further underscores the urgent need to explore novel alternatives. The current study employs an in silico strategy that integrates genomics, genome-wide association studies (GWASs), network analyses, and membrane protein simulations to systematically identify and prioritize new antimicrobial targets.MethodologyA total of 665 PRSP genomes from Indian clinical isolates collected between 1996 and 2022 were analyzed. High-quality genome assemblies were annotated and used for pangenome construction and GWASs to identify gene clusters associated with penicillin resistance. Candidate genes were further prioritized through essentiality screening, functional annotation, subcellular localization prediction, evolutionary conservation analysis, druggability assessment, and structural modeling.ResultsIntegrated analysis identified OppC2, an essential oligopeptide permease of the ABC transporter family, as a highly favorable drug target. Network and functional enrichment analyses linked OppC2 to transport-associated pathways relevant to pneumococcal survival and adaptation. Structural modeling revealed a high-confidence protein model with a druggable binding pocket, while molecular dynamics simulations confirmed the stability of the structure in a physiological membrane environment.ConclusionThis comprehensive approach enabled the identification of conserved, essential, and accessible drug targets within PRSP populations, providing an adaptable framework to guide next-generation antimicrobial target identification beyond traditional PBPs.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1794098</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1794098</link>
        <title><![CDATA[Machine learning-based determination of sex-related bladder cancer biomarkers]]></title>
        <pubdate>2026-04-29T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Joseph R. Pizzi</author><author>Image Adhikari</author><author>Prakyat Prakash</author><author>Hangchuan Shi</author><author>Hiroshi Miyamoto</author><author>Feng Cui</author>
        <description><![CDATA[IntroductionBladder cancer exhibits sex-specific behavior, occurring more frequently in males but progressing to advanced stages more commonly in females. The activation of sex hormone receptors may explain these differences, but the exact genetic drivers remain poorly understood. Furthermore, current bladder cancer biomarkers have inconsistent sensitivities and specificities in practice, making early diagnosis a challenge.MethodsThis study approaches bladder cancer biomarker discovery through machine learning techniques on gender and disease-stratified RNA-seq data. Training sets limited to differentially expressed genes were subjected to four different feature selection methods: differential gene expression analysis adjusted p-value, recursive feature elimination with support vector machine, logistic regression, and an optimized random forest procedure. Gene panels were compared and aggregated across selection strategies and cross-validation folds to identify robust biomarkers for sex-specific bladder cancer development and progression.ResultsWhen applied to unseen datasets and limited to 50 genes or less, male and female-specific panels achieved areas under the receiver operating characteristic curve of 0.932 and 0.914, respectively, in distinguishing bladder cancer samples from non-tumor controls. In terms of enriched pathways, the male panel was associated with cell interactions and altered PI3K-AKT signaling, while the female panel was more closely connected to extracellular matrix reorganization. The panel differentiating male and female tumors had a poorer performance on external datasets compared to the sex-specific analyses, but still contained relevant genes.DiscussionGenes such as PRAC1 and PCDH11Y were identified as high-impact predictors related to sex hormones or chromosomes for male tumor development. In the female-specific panel, genes related to aberrant androgen signaling across tumor types like androgen receptor, PLXNA1, USP54, and PMEPA1 were influential. These results offer potential targets for further in vivo/vitro experimentation and provide a framework for constructing high-performance gene panels related to sex-specific bladder cancer biology.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1822250</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1822250</link>
        <title><![CDATA[In silico identification of phytocompounds derived from Glycyrrhiza glabra as potential inhibitors of actin assembly-inducing protein in Listeria monocytogenes: a virtual screening and molecular dynamics study]]></title>
        <pubdate>2026-04-28T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Deepasree K.</author><author>Sudha Ramaiah</author>
        <description><![CDATA[IntroductionNatural compounds present in medicinal plants have made significant contributions to the field of drug development due to their diverse therapeutic properties. One such crucial application of the phytocompounds is to suppress the survival of pathogenic microbes that withstand the current treatment regimens.MethodsFollowing an in silico methodology, the primary goal of this study was to understand the potential inhibitory action of 106 phytocompounds of Liquorice (Glycyrrhiza glabra) on the virulence protein ActA present in Listeria monocytogenes. The ActA protein was modelled initially and virtual screening was further performed to confirm the potential candidates.ResultsXambioona and Licoisoflavone B exhibited good binding affinity values of −10.4 kcal/mol and −8.7 kcal/mol with the AlphaFold model of ActA protein, respectively.DiscussionMolecular dynamics (MD) simulation for a timescale of 100 ns and binding free analysis revealed Licoisoflavone B to be a promising phytocompound due to its overall conformational stability.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1765472</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1765472</link>
        <title><![CDATA[Repurposing the angiotensin II receptor blocker valsartan to inhibit penicillin-binding protein 3 and its mutants in Haemophilus influenzae: a comprehensive in silico approach]]></title>
        <pubdate>2026-04-28T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Srujal Kacha</author><author>Janani Arun</author><author>Neelu Nargund</author><author>Tushar Joshi</author><author>Shalini Mathpal</author><author>Sudha Ramaiah</author><author>Anand Anbarasu</author>
        <description><![CDATA[IntroductionAmpicillin-resistant Haemophilus influenzae (H. influenzae) has been recently designated as a medium-priority bacterial pathogen in 2024 by the World Health Organization (WHO). This pathogen is responsible for a wide range of infections, including sinusitis, acute otitis media, and pneumonia, as well as severe and life-threatening conditions such as bacteremia, meningitis, and epiglottitis. In this context, drug repurposing has emerged as an effective strategy, as the pharmacokinetic properties and safety profiles of approved drugs are already well established, allowing for faster development compared to conventional drug discovery approaches.MethodsIn the current study, U.S. Food and Drug Administration (FDA)-approved drugs with structural similarity to ampicillin were filtered and evaluated using in silico approaches. Their pharmacokinetic properties and antimicrobial potential were assessed. Molecular docking and simulation studies were conducted to evaluate binding affinities toward wild-type penicillin-binding protein 3 (PBP3WT) and its common mutants (PBP3N526K and PBP3R517H).ResultsAn angiotensin II receptor blocker, valsartan, demonstrated strong binding affinity toward all three target proteins, with values of −11.8 kcal/mol for PBP3WT, −11.4 kcal/mol for PBP3N526K, and −11.1 kcal/mol for PBP3R517H. The drug also exhibited strong intermolecular interactions and maintained stable binding with all three PBP3 variants during molecular dynamics simulations.DiscussionBased on these findings, valsartan is proposed as a potential PBP3 inhibitor targeting H. influenzae. The results support its candidacy for drug repurposing; however, further in vitro investigations are recommended to experimentally validate its antimicrobial activity.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1803572</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1803572</link>
        <title><![CDATA[Public health risk stratification using hybrid machine learning: a reproducible analysis of performance, stability, and risk attribution]]></title>
        <pubdate>2026-04-23T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Alejandro Cabrera-Andrade</author><author>Ana Karina Zambrano</author><author>Joselin García-Ortiz</author><author>William Villegas-Ch</author>
        <description><![CDATA[Risk stratification in public health involves organizing heterogeneous health-related signals into consistent representations that support population-level analysis. In large-scale datasets, such as National Health and Nutrition Examination Survey (NHANES) and Behavioral Risk Factor Surveillance System (BRFSS), the integration of clinical, biometric, behavioral, and self-reported variables introduces structural variability that challenges conventional modeling approaches. This study proposes a hybrid learning framework that combines linear and nonlinear components to analyze induced risk representations derived from multidimensional health data. The model is evaluated using NHANES 2017–2018, BRFSS 2019, and an Integrated Public Health Dataset constructed through semantic harmonization of both sources. The experimental design is based on a controlled formulation in which a continuous risk index is constructed from the available variables and discretized into ordinal classes using quantiles, enabling systematic analysis of how models approximate structured partitions of the input space rather than predicting independent clinical outcomes. The results show that the hybrid scheme maintains consistent macro F1 and macro-ROC-AUC values across all scenarios with low fold-to-fold variability, reflecting the regularity of the induced class structure rather than predictive generalization. Attribution analysis reveals that the organization of the risk representation varies according to the nature of the data, with concentrated patterns in clinical signals, distributed contributions in behavioral variables, and intermediate structures in the integrated dataset. These findings demonstrate that hybrid schemes provide a stable and interpretable framework for analyzing the structural organization of risk in heterogeneous public health data.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1821804</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1821804</link>
        <title><![CDATA[DEP-track: a motion-aware framework for large-scale cell tracking and crossover frequency estimation in dielectrophoresis]]></title>
        <pubdate>2026-04-23T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Sena Lee</author><author>Seungyeop Choi</author><author>Yerin Lee</author><author>Hyunmin Bae</author><author>Junghun Han</author><author>Yoon Suk Kim</author><author>Sang Woo Lee</author><author>Sejung Yang</author>
        <description><![CDATA[Precise and scalable analysis of single-cell responses under dielectrophoresis (DEP) remains challenging, particularly in long-term experiments involving frequency modulation and dense cell populations. Conventional DEP workflows rely heavily on manual trajectory inspection or repeated measurements, limiting throughput, reproducibility, and statistical power. Here, we present DEP-Track, a motion-aware computational framework designed for automated large-scale trajectory preservation and crossover frequency estimation from frequency-modulated DEP microscopy data, where the crossover frequency is defined as the point at which the direction of DEP-induced cell motion reverses. The framework integrates anchor-free cell detection with motion-aware trajectory association to maintain single-cell identity across abrupt polarity-induced motion transitions over tens of thousands of frames. By unifying velocity-based estimation under fixed frequencies and trajectory-based estimation under continuous frequency modulation, DEP-Track enables automated extraction of statistically consistent estimates of crossover frequency at the single-cell level from repeated crossover events within a single experiment. In long-term time-lapse imaging experiments (13,200 frames), hundreds of cells were continuously tracked, enabling population-scale analysis without repeated experimental runs. Importantly, this study focuses exclusively on estimating the crossover frequency at the single-cell level. The estimated crossover frequencies showed strong agreement with conventional analysis workflows and previously reported measurements, confirming analytical accuracy and reproducibility. By transforming DEP analysis into a scalable and reproducible computational workflow, DEP-Track establishes a framework for high-throughput dielectric phenotyping based on crossover frequency.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1810235</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1810235</link>
        <title><![CDATA[A machine learning-derived genomic dataset from bacteria frequently reported as probiotics]]></title>
        <pubdate>2026-04-22T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Diego Lucas Neres Rodrigues</author><author>Pedro Alexandre Sodrzeieski</author><author>Sandrine Auger</author><author>Jean-Marc Chatel</author><author>Ana Maria Benko-Iseppon</author><author>Vasco Azevedo</author><author>Siomar de Castro Soares</author><author>Flávia Figueira Aburjaile</author>
        <description><![CDATA[Probiotics are live microorganisms that have been widely investigated for their association with beneficial host outcomes, particularly in the context of gut-associated microbial communities. Despite extensive literature, the probiotic effects are recognized as strain-specific and highly context-dependent, which limits the identification of universal genetic determinants of probiosis. In this study, we present a machine learning-derived genomic dataset generated from comparative analyses of bacterial genomes belonging to taxa frequently reported as probiotics and reference gut-associated bacteria. Using pangenomic analysis combined with supervised machine learning approaches, including Random Forest, Support Vector Machine, and Logistic Regression, we extracted discriminative genomic features from large-scale genome data. The resulting dataset comprises 1,072 non-redundant protein-coding sequences, accompanied by gene presence-absence matrices and functional annotations. These features should not be interpreted as causal determinants of probiotic functionality, but rather as genomic patterns associated with bacterial taxa commonly used as probiotics, which may also reflect taxonomic and ecological signatures. All data and scripts used in this study are publicly available through an open-access repository, providing a reusable resource for exploratory analyses, comparative genomics, and methodological benchmarking in probiogenomics and microbial genomics. The final data, hereby called ProbioSML, is currently available on https://doi.org/10.5281/zenodo.14181443.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1803237</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1803237</link>
        <title><![CDATA[How benchmarking of bioinformatics tools is essential for informed workflow selection: a case study on SARS-CoV-2 subgenomic RNA detection]]></title>
        <pubdate>2026-04-22T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Gabriele Leoni</author><author>Mauro Petrillo</author><author>Man-Hung Eric Tang</author><author>Soren Alexandersen</author>
        <description><![CDATA[IntroductionSelecting appropriate bioinformatics tools is critical for accurate and reproducible analysis, particularly in support of genomic surveillance and molecular biomarker monitoring. The importance of these analyses is underscored by the need for effective public health responses to emerging diseases like SARS-CoV-2.MethodsBy using the detection of SARS-CoV-2 subgenomic RNAs (sgRNAs) as a case study, we show the importance of systematic benchmarking in selecting optimal workflows. We generated 25 synthetic Illumina datasets simulating both shotgun and amplicon sequencing strategies, along with a real-world wastewater dataset. Using these datasets, we assessed the influence of key variables including mutation profiles, read lengths, aligner choice, and primer design for targeted sequencing.ResultsOur results revealed substantial performance variability: common tools developed to identify sgRNAs struggled with shotgun data and were sensitive to mutations depending on the chosen aligner, while amplicon sequencing improved detection sensitivity, with aligners and primer design choices still significantly impacting outcomes.DiscussionOur results highlight the need for benchmarking steps and analyses to inform workflow selection. Without such evaluations, researchers risk drawing inaccurate conclusions from suboptimal workflows. This case study underscores the value of context-aware tool selection and encourages standardised benchmarking practices to ensure reproducibility and reliability in bioinformatics analysis, particularly in evidence-based decision-making environments such as public health and policymaking.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1748364</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1748364</link>
        <title><![CDATA[An automated cell-tracking pipeline for the analysis of neutrophil dynamics]]></title>
        <pubdate>2026-04-21T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Chen Li</author><author>Wilson W. C. Yiu</author><author>Wanbin Hu</author><author>Herman P. Spaink</author><author>Lu Cao</author><author>Fons J. Verbeek</author>
        <description><![CDATA[Neutrophils play a key role in the innate immune system. They act as the primary line of defense when bacteria, viruses, or other harmful foreign particles invade the immune system. Accurate movement measurement of neutrophils, including velocity, direction, and displacement, is crucial to studying the regulation of cell migration behavior. Cell tracking is a key technology to realize the quantification of these measurements. In this article, we developed a pipeline, including cell segmentation, cell motion tracking between two frames, and trajectory linkage, to realize cell tracking. Our starting point was to collect time-lapse sequences of neutrophils using a confocal microscope. We pre-processed each frame in the time-lapse sequence to improve the image quality by denoising, smoothing, and contrast enhancement. Subsequently, a deep learning model, that is, U-Net, was used to segment cells in each image frame. U-Net was used again to track the cells between two adjacent frames by calculating the score matrices representing the posterior probability of linkage. Moreover, an extended Viterbi algorithm was applied to find optimal trajectories based on score matrices generated by the U-Net. Results demonstrate that our pipeline outperforms other representative linkage methods used in cell tracking. It provides a robust, practical solution for a challenging and highly motile in vivo regime.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1800237</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1800237</link>
        <title><![CDATA[Feature representation for explainable CRISPR off-target prediction and base editing efficiency]]></title>
        <pubdate>2026-04-20T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Faiza Hasin</author><author>Michele Minervini</author><author>Corrado Mencar</author><author>Giuseppe Ventrella</author><author>Arianna Consiglio</author><author>Alessandro Orro</author><author>Tommaso Selmi</author>
        <description><![CDATA[IntroductionThe interaction between guide RNAs (gRNAs) and target DNA sequences is a critical factor in the effectiveness of CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-associated protein 9) gene editing. Predicting these interactions accurately necessitates models that offer biological knowledge in addition to high accuracy. This study analyzes the impact of feature representation on accuracy and interpretability in off-target prediction.MethodsWe address two CRISPR applications: gene knockout (KO) and base editing (BE) using distinct benchmark datasets. For the KO problem, we utilized CHANGE-seq and GUIDE-seq to evaluate paired sequence representations, while the Hanna screening dataset has been used for BE. We approached the prediction problem both as a classification and regression task using XGBoost models.ResultsIn the case of KO, there is not a single universally optimal encoding. For both classification and regression, One-Hot and its variants (OH, OH5C) achieve the best results on GUIDE-seq (AUPR = 0.661, Pearson = 0.756), while the Bulges representation performs best on CHANGE-seq (AUPR = 0.612, Pearson = 0.602). In the case of BE, One-hot encoding consistently outperforms K-mer representation for predictive accuracy both as regression and classification (AUPR = 0.723, Pearson = 0.746).DiscussionOur analysis demonstrates comparable predictive performance across both gene knockout and base editing tasks, confirming the robustness of the framework in distinct editing domains. Interpretability analysis using SHapley Additive exPlanations (SHAP) reveals that despite different mechanisms, the Protospacer Adjacent Motif (PAM)-proximal region remains a critical feature for prediction for both editing mechanisms.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1767204</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1767204</link>
        <title><![CDATA[Diversity and evolution of quorum-sensing systems in Rhizobium]]></title>
        <pubdate>2026-04-17T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Ivana Blancas-Nava</author><author>Erick Cruz-Santiago</author><author>Gabriela Guerrero</author><author>Rosa-Maria Gutierrez-Rios</author><author>Miguel A. Cevallos</author>
        <description><![CDATA[Quorum-sensing (QS) systems based on acyl-homoserine lactones (AHLs) regulate gene expression in response to cell density in many bacteria, including Rhizobium. These systems, typically composed of LuxI-like synthases and LuxR-like regulators, control processes such as plasmid conjugation, biofilm formation, and plant interactions. However, their evolutionary dynamics and genomic distribution in Rhizobium remain poorly understood. We analyzed 142 complete Rhizobium genomes using comparative genomics, phylogenetic reconstruction, and genomic context analysis. LuxI/LuxR homologs were identified based on sequence similarity and Pfam domain architecture, and their genomic contexts were examined. Phylogenetic relationships and coevolution between LuxI/LuxR pairs were assessed using cophylogenetic approaches. QS systems showed a highly heterogeneous distribution across Rhizobium genomes: some strains lacked canonical systems, whereas others encoded one or multiple systems in chromosomes and/or plasmids. Chromosomal QS systems were associated with multiple distinct genomic contexts, supporting at least seven independent acquisition events. In contrast, plasmid-encoded systems exhibited substantially greater diversity in both sequence and genomic organization. Phylogenetic and comparative analyses revealed dynamic gains and losses of QS systems, variable coevolution among LuxI/LuxR pairs, and evidence of partner recruitment. Notably, plasmids appear to act as major reservoirs of QS systems and likely sources of their transfer to chromosomes. These findings indicate that QS systems in Rhizobium evolve through a combination of horizontal gene transfer, genomic rearrangement, and differential retention across replicons. The higher diversity and mobility of plasmid-encoded systems highlight their central role in shaping QS evolution and functional innovation. Overall, this study provides a comprehensive framework for understanding the diversification and evolutionary trajectories of QS systems in complex multipartite bacterial genomes.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1806975</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1806975</link>
        <title><![CDATA[Geometric multidimensional representation of omic signatures]]></title>
        <pubdate>2026-04-17T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Higor Almeida Cordeiro Nogueira</author><author>Enrique Medina-Acosta</author>
        <description><![CDATA[IntroductionMulti-omic signatures are widely used in biomarker discovery, precision oncology, and systems biology, yet they are typically treated as vectors or composite scores that collapse intrinsically multidimensional biological organization into one-dimensional summaries. As a result, their internal structure, contextual dependencies, and functional coherence remain largely inaccessible.MethodsHere, we introduce a geometric framework that reconceptualizes omic signatures as multidimensional informational entities whose biological meaning arises from structural organization rather than molecular membership alone. Each signature is embedded in a shared latent space integrating regulatory, phenotypic, microenvironmental, immune, and clinical constraints, and represented as a convex polytope. This representation preserves internal organization and enables intrinsic geometric measurements—including barycenter distance, volume, anisotropy, and asymmetry—that quantify concordance, divergence, and latent complexity. We applied this framework to 24,796 metabolic regulatory circuitries reconstructed across 32 TCGA cancer types, encoded as paired regulatory and metabolic signatures in an 18-dimensional latent space.ResultsGeometric analysis shows that discordance predominates: most circuitries occupy strong or extreme discordance regimes and display high-dimensional, frequently asymmetric geometries, whereas fully concordant circuitries are rare and structurally constrained. These geometric phenotypes stratify metabolic pathways and superfamilies in reproducible, non-uniform patterns that are not readily captured by conventional vector- or network-based representations.DiscussionBy transforming omic signatures into measurable geometric objects, this framework provides a principled approach for the comparison and de-redundancy of multi-omic biomarkers, providing a scalable method for analyzing complex regulatory systems across cancer and beyond. All geometric representations and derived descriptors are available through the SigPolytope Shiny application (https://sigpolytope.shinyapps.io/geometricatlas/).]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1839097</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1839097</link>
        <title><![CDATA[Correction: Protein embeddings reveal a continuous molecular landscape of host adaptation in waterfowl parvoviruses]]></title>
        <pubdate>2026-04-16T00:00:00Z</pubdate>
        <category>Correction</category>
        <author>Nihui Shao</author><author>Yunfei Guo</author>
        <description></description>
      </item>
      </channel>
    </rss>