Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Bioeng. Biotechnol.

Sec. Biosensors and Biomolecular Electronics

Volume 13 - 2025 | doi: 10.3389/fbioe.2025.1543688

This article is part of the Research TopicBiomechanics, Sensing and Bio-inspired Control in Rehabilitation and Assistive Robotics, Volume IIView all 20 articles

Multimodal Intelligent Biosensing for Life Assistance Monitoring and Intension Recognition

Provisionally accepted
Danyal  KhanDanyal Khan1Haifa  AlhassonHaifa Alhasson2Asaad  AlgarniAsaad Algarni3Nouf  Abdullah AlmujallyNouf Abdullah Almujally4Ahmad  JalalAhmad Jalal1*Hui  LiuHui Liu5*
  • 1Air University, Islamabad, Pakistan
  • 2Qassim University, Ar Rass, Saudi Arabia
  • 3Northern Border University, Arar, Northern Borders, Saudi Arabia
  • 4Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
  • 5University of Bremen, Bremen, Bremen, Germany

The final, formatted version of the article will be published soon.

Human activity recognition (HAR) systems are a rapidly evolving field, particularly in applications ranging from healthcare to surveillance, which require both accuracy and interpretability. This study presents a multimodal HAR framework that incorporates both RGB and inertial sensor data to address the limitations of single-modality approaches. Using the LARA and UTD Multimodal Human Action Datasets, our proposed model employs a sequence of advanced pre-processing steps, which include extracting frames and skeletons from RGB data and minimizing noise in signal and segmentation for inertial data, to capture detailed information regarding position, structure, and shape over time. The architecture combines these modalities through a feature fusion layer, then integrates statistical features from inertial sensors with spatial features from RGB-derived skeleton data, resulting in a diverse, highdimensional space. A Gated Recurrent Unit (GRU) network processes this fused feature vector, specifically best suited for sequential data analysis. The model was rigorously evaluated using Stratified Fold cross-validation, achieving 96% accuracy on the UTD multimodal human action (UTD-MHAD) dataset and a similarly high 93% accuracy on the LARA dataset, demonstrating high performance across different activity types. Furthermore, to enhance model transparency and trust, the SHAP interpretability (Explainable AI) method was also applied, which provides an understanding of feature contributions and ensures interpretability in real-world applications. Our proposed model leveraged the benefits of multimodal fusion and handcrafted feature engineering in HAR to contribute to the development of high-precision, interpretable HAR models for complex activity classification tasks.

Keywords: intelligent sensing, image analysis, human computer interaction, machine learning models, feature extraction, Biosensors devices, Healthcare monitoring Intelligent sensing, Healthcare monitoring

Received: 11 Dec 2024; Accepted: 20 Aug 2025.

Copyright: © 2025 Khan, Alhasson, Algarni, Almujally, Jalal and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Ahmad Jalal, Air University, Islamabad, Pakistan
Hui Liu, University of Bremen, Bremen, 28359, Bremen, Germany

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.