Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Bioinform.

Sec. Genomic Analysis

Volume 5 - 2025 | doi: 10.3389/fbinf.2025.1657841

Sequence-Based Prioritization of i-Motif Candidates in the Human Genome

Provisionally accepted
Mauro  FasanoMauro Fasano*Veronica  RemoriVeronica RemoriMichela  PrestMichela Prest
  • University of Insubria, Varese, Italy

The final, formatted version of the article will be published soon.

Introduction: i-Motifs (iMs) are cytosine-rich, four-stranded DNA structures with emerging roles in gene regulation and genome stability. Despite their biological relevance, genome-wide prediction of iM-forming sequences remains limited by low specificity and high false-positive rates, leading to considerable experimental burden.To address this, we developed a refined computational approach that prioritizes highconfidence iM candidates using a Position-Specific Similarity Matrix (PSSM) derived from multiple sequence alignments. The human reference genome (hg38) was scanned using a custom regular expression targeting cytosine-rich motifs, followed by scoring each sequence with the PSSM. Statistical significance was assessed via permutation testing, one-sided t-tests, Benjamini-Hochberg correction, and Z-scores.Results: This pipeline identified 37,075 candidate sequences (15-46 nucleotides) with strong iMforming potential. Validation against experimentally confirmed iMs and known G-quadruplexes (G4s) demonstrated significant differences in alignment scores and sequence similarity, confirming structural specificity. A random forest classifier trained on nucleotide features further supported the distinctiveness of the candidates, achieving a high classification performance.This work presents a scalable and statistically robust method to enrich for biologically relevant iM sequences, providing a valuable resource for future experimental validation and the rational design of ligands targeting iMs to modulate gene expression in contexts such as cancer.

Keywords: I-Motif, multiple sequence alignment, position-specific similarity matrix, prioritization, random forest

Received: 01 Jul 2025; Accepted: 28 Jul 2025.

Copyright: Ā© 2025 Fasano, Remori and Prest. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mauro Fasano, University of Insubria, Varese, Italy

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.