AUTHOR=Zahra Ishrat , Wu Yanfeng , Alhasson Haifa F. , Alharbi Shuaa S. , Aljuaid Hanan , Jalal Ahmad , Liu Hui 

TITLE=Dynamic graph neural networks for UAV-based group activity recognition in structured team sports

JOURNAL=Frontiers in Neurorobotics

VOLUME=Volume 19 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/neurorobotics/articles/10.3389/fnbot.2025.1631998

DOI=10.3389/fnbot.2025.1631998

ISSN=1662-5218

ABSTRACT=IntroductionUnderstanding group actions in real-world settings is essential for the advancement of applications in surveillance, robotics, and autonomous systems. Group activity recognition, particularly in sports scenarios, presents unique challenges due to dynamic interactions, occlusions, and varying viewpoints. To address these challenges, we develop a deep learning system that recognizes multi-person behaviors by integrating appearance-based features (HOG, LBP, SIFT), skeletal data (MediaPipe, MOCON), and motion features. Our approach employs a Dynamic Graph Neural Network (DGNN) and Bi-LSTM architecture, enabling robust recognition of group activities in diverse and dynamic environments. To further validate our framework’s adaptability, we include evaluations on Volleyball and SoccerTrack UAV-recorded datasets, which offer unique perspectives and challenges.MethodOur framework integrates YOLOv11 for object detection and SORT for tracking to extract multi-modal features—including HOG, LBP, SIFT, skeletal data (MediaPipe), and motion context (MOCON). These features are optimized using genetic algorithms and fused within a Dynamic Graph Neural Network (DGNN), which models players as nodes in a spatio-temporal graph, effectively capturing both spatial formations and temporal dynamics.ResultsWe evaluated our framework on three datasets: a volleyball dataset, SoccerTrack UAV-based soccer dataset, and NBA basketball dataset. Our system achieved 94.5% accuracy on the volleyball dataset (mAP: 94.2%, MPCA: 93.8%) with an inference time of 0.18 s per frame. On the SoccerTrack UAV dataset, accuracy was 91.8% (mAP: 91.5%, MPCA: 90.5%) with 0.20 s inference, and on the NBA basketball dataset, it was 91.1% (mAP: 90.8%, MPCA: 89.8%) with the same 0.20 s per frame. These results highlight our framework’s high performance and efficient computational efficiency across various sports and perspectives.DiscussionOur approach demonstrates robust performance in recognizing multi-person actions across diverse conditions, highlighting its adaptability to both conventional and UAV-based video sources.