AUTHOR=LinLin Hong , Sangheang Lee , GuanTing Song TITLE=CAM-Vtrans: real-time sports training utilizing multi-modal robot data JOURNAL=Frontiers in Neurorobotics VOLUME=Volume 18 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/neurorobotics/articles/10.3389/fnbot.2024.1453571 DOI=10.3389/fnbot.2024.1453571 ISSN=1662-5218 ABSTRACT=IntroductionAssistive robots and human-robot interaction have become integral parts of sports training. However, existing methods often fail to provide real-time and accurate feedback, and they often lack integration of comprehensive multi-modal data.MethodsTo address these issues, we propose a groundbreaking and innovative approach: CAM-Vtrans—Cross-Attention Multi-modal Visual Transformer. By leveraging the strengths of state-of-the-art techniques such as Visual Transformers (ViT) and models like CLIP, along with cross-attention mechanisms, CAM-Vtrans harnesses the power of visual and textual information to provide athletes with highly accurate and timely feedback. Through the utilization of multi-modal robot data, CAM-Vtrans offers valuable assistance, enabling athletes to optimize their performance while minimizing potential injury risks. This novel approach represents a significant advancement in the field, offering an innovative solution to overcome the limitations of existing methods and enhance the precision and efficiency of sports training programs.