AUTHOR=Lu Shiyi , Wang Panpan TITLE=Multi-dimensional fusion: transformer and GANs-based multimodal audiovisual perception robot for musical performance art JOURNAL=Frontiers in Neurorobotics VOLUME=Volume 17 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/neurorobotics/articles/10.3389/fnbot.2023.1281944 DOI=10.3389/fnbot.2023.1281944 ISSN=1662-5218 ABSTRACT=In the midst of rapid societal evolution, the appreciation of artistic creations has been undergoing continuous transformation. Audience demands have shifted towards experiences that resonate on deeper emotional levels. Against this backdrop, multimodal robot music performance art emerges as a novel form of artistic expression. This paper explores the fusion of music and motion in robot performances to enhance expressiveness and emotional impact. We employ Transformer models to combine audio and video signals, enabling robots to better understand the rhythm, melody, and emotion of music during performances. Generative Adversarial Networks enable robots to create lifelike visual performances based on music, merging auditory and visual perception. Through multimodal reinforcement learning, robots synchronize their actions with music, achieving harmonious alignment between sound and motion. Our experiments validate our approach across diverse music styles and emotions. We use metrics such as accuracy, recall rate, and F1 score to quantify the impact of our methodology. For instance, our approach achieves a performance smoothness score exceeding 94 points, a 95% accuracy rate, and a significant 33% enhancement in performance recall rate compared to baseline modules. The collective elevation in F1 score underscores the advantages of our approach within the realm of robot music performance art.