Your new experience awaits. Try the new design now and help us make it even better

TECHNOLOGY AND CODE article

Front. Bioinform.

Sec. RNA Bioinformatics

This article is part of the Research TopicAI in RNA ScienceView all articles

TRANSAID: A Hybrid Deep Learning Framework for Translation Site Prediction with Integrated Biological Feature Scoring

Provisionally accepted
Zengding  WuZengding Wu1Boran  WangBoran Wang2Zhen  LiuZhen Liu3Wei  WeiWei Wei4Caiyi  FeiCaiyi Fei1Shi  XuShi Xu1Tiyun  HanTiyun Han1Wei  GengWei Geng1Yan  LiYan Li5*
  • 1Department of AI and Bioinformatics, Nanjing Chengshi Biopharmaceutical (TheraRNA) Co., Ltd.,, Nanjing, China
  • 2Beijing Tiantan Hospital, Capital Medical University,, Beijing, China
  • 3Breast Disease Diagnosis and Treatment Center, Affiliated Hospital of Qinghai University, Affiliated Cancer Hospital of Qinghai University, Xining, China
  • 4Beijing Friendship Hospital, Capital Medical University, Beijing, China
  • 5Peking Union Medical College Hospital Western Branch, Xicheng, China

The final, formatted version of the article will be published soon.

Translation initiation and termination are critical regulatory checkpoints in protein synthesis, yet accurate computational prediction of their sites remains challenging due to training data biases and the complexity of full-length transcripts. To address these limitations, we present TRANSAID (TRANSlation AI for Detection), a novel deep learning framework that accurately and simultaneously predicts translation initiation (TIS) and termination (TTS) sites from complete transcript sequences. TRANSAID's hierarchical architecture efficiently processes long transcripts, capturing both local motifs and long-range dependencies. Crucially, the model was trained on a human transcriptome dataset that was rigorously partitioned at the gene level to prevent data leakage and included both protein-coding (NM) and non-coding (NR) transcripts. This mixed-training strategy enables TRANSAID to achieve high fidelity, correctly identifying 73.61% of NR transcripts as non-coding. Performance is further enhanced by an integrated biological scoring system, improving "perfect ORF prediction" for coding sequences to 94.94% and "correct non-coding prediction" to 82.00%. The human-trained model demonstrates remarkable cross-species applicability, maintaining high accuracy on organisms from mammals to yeast. Beyond annotation, TRANSAID serves as a powerful discovery tool for novel coding events. When applied to long-read sequencing data, it accurately identified previously unannotated protein isoforms validated by mass spectrometry (76.28% validation rate). Furthermore, homology searches of high-scoring ORFs predicted within NR transcripts suggest a strong potential for identifying cryptic translation events. As a fully documented open-source tool with a user-friendly web server, TRANSAID provides a powerful and accessible resource for improving transcriptome annotation and proteomic discovery.

Keywords: Translation site prediction, deep learning, Open reading frame, Integratedscoring system, Cross-species analysis, Transcriptome annotation

Received: 30 Jul 2025; Accepted: 16 Dec 2025.

Copyright: © 2025 Wu, Wang, Liu, Wei, Fei, Xu, Han, Geng and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Yan Li

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.