AUTHOR=Li Fuyu , Lu Wenxiang , Bai Yunfei TITLE=HyenaCircle: a HyenaDNA-based pretrained large language model for long eccDNA prediction JOURNAL=Frontiers in Genetics VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2025.1641162 DOI=10.3389/fgene.2025.1641162 ISSN=1664-8021 ABSTRACT=IntroductionExtrachromosomal circular DNA (eccDNA) represents a class of circular DNA molecules derived from chromosomes with diverse roles in disease. Long eccDNAs (typically 1–5 kb) pose detection challenges due to their large size, hindering functional studies. We propose HyenaCircle, a novel deep learning model leveraging large language model and third-generation sequencing data to predict long eccDNA formation.MethodsFull-length eccDNAs within 1–5 kb were identified by FLED algorithm for Nanopore sequencing data, extended by 100-bp flanking sequences, and paired with 20,000 length-matched negative controls from eccDNA-depleted genomic regions. HyenaCircle was built by adapting the pretrained HyenaDNA model with a designed classifier head. The strategies of data augmentation, regularization and class imbalance weighting were applied to increase model robustness.ResultsHyenaCircle achieved comparable performance with a validation AUROC of 0.715 and recall of 0.776. It surpassed DNABERT by 5.9% in AUROC and demonstrated stable convergence. Hyperparameter optimization confirmed batch size 16 and learning rate 5 × 10−5 as optimal. The ablation studies revealed flanking sequences are important, as their removal reduced model stability. The model also showed superior stability over the baseline HyenaDNA architecture.ConclusionHyenaCircle integrated third-generation sequencing data and large language model for long eccDNA prediction, which outperformed the existing model. Our work demonstrates that the HyenaDNA architecture enables effective long-sequence genomic modeling and provides a new insight for eccDNA prediction and identification.