Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Machine Learning and Artificial Intelligence

Volume 8 - 2025 | doi: 10.3389/frai.2025.1630743

Efficient Spatio-temporal Modeling for Sign Language Recognition using CNN and RNN Architectures

Provisionally accepted
  • 1Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania
  • 2Mzumbe, Morogoro, Tanzania

The final, formatted version of the article will be published soon.

Computer vision has been identified as one of the solutions to bridge communication barriers between speech-impaired populations and those without impairment, as most people are unaware of the sign language used by speech-impaired individuals. Numerous studies have been conducted to address this challenge. However, recognizing word signs, which are usually dynamic and involve more than one frame per sign, remains a challenge. This study used Tanzania Sign Language datasets collected using mobile phone selfie cameras to investigate the performance of deep learning algorithms that capture spatial and temporal relationships features of video frames. The study used CNN-LSTM and CNN-GRU architectures, where CNN-GRU with an ELU activation function is proposed to enhance learning efficiency and performance. The findings indicate that the proposed CNN-GRU model with ELU activation achieved an accuracy of 94%, compared to 93% for the standard CNN-GRU model and CNN-LSTM. Additionally, the study evaluated performance of the proposed model in a signer-independent setting, where results varied significantly across individual signers, with the highest accuracy reaching 66%. These results show that more effort is required to improve signer independence performance, including the challenges of hand dominance by optimizing spatial features.

Keywords: CNN-GRU, CNN-LSTM, deep learning, ELU activation function, sign language, Tanzania Sign Language

Received: 18 May 2025; Accepted: 04 Aug 2025.

Copyright: © 2025 Myagila, Nyambo and Dida. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Kasian Myagila, Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.