ORIGINAL RESEARCH article
Front. Comput. Sci.
Sec. Mobile and Ubiquitous Computing
Volume 7 - 2025 | doi: 10.3389/fcomp.2025.1569205
This article is part of the Research TopicWearable Computing, Volume IIIView all articles
Improving IMU based Human Activty Recognition Using Simulated Multimodal Representations and a MoE Classifier
Provisionally accepted- 1German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
- 2Technical University of Kaiserslautern, Kaiserslautern, Rhineland-Palatinate, Germany
- 3Hong Kong University of Science and Technology, Guangzhou, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
The lack of labeled sensor data for Human Activity Recognition (HAR) has driven researchers to synthesize Inertial Measurement Unit (IMU) data from video, utilizing the rich activity annotations available in video datasets. However, current synthetic IMU data often struggles to capture subtle, fine-grained motions, limiting its effectiveness in real-world HAR applications. To address these limitations, we introduce Multi 3 Net+, an advanced framework leveraging cross-modal, multitask representations of text, pose, and IMU data. Building on its predecessor, Multi 3 Net, it uses improved pre-training strategies and a mixture of experts classifier to effectively learn robust joint representations. By leveraging refined contrastive learning across modalities, Multi 3 Net+ bridges the gap between video and wearable sensor data, enhancing HAR performance for complex, fine-grained activities. Our experiments validate the superiority of Multi 3 Net+, showing significant improvements in generating high-quality synthetic IMU data and achieving state-of-theart performance in wearable HAR tasks. These results demonstrate the efficacy of our approach in advancing real-world HAR by combining cross-modal learning with multi-task optimization.
Keywords: HAR, Sensor simulation, multi-modal learning, pretraining, MoE Classifier
Received: 31 Jan 2025; Accepted: 26 Jun 2025.
Copyright: © 2025 Ray, Xia, Rey, Wu and Lukowicz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Lala Shakti Swarup Ray, German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Qingxin Xia, Hong Kong University of Science and Technology, Guangzhou, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.