ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Machine Learning and Artificial Intelligence
This article is part of the Research TopicEmerging Trends in Representation Learning: Multimodal, Graph-based, and Contrastive PerspectivesView all articles
ADP-Net: A Hierarchical Attention-Diffusion-Prediction Framework for Human Trajectory Prediction
Provisionally accepted- School of Microelectronics Science and Technology,Sun Yat-sen University, Guangzhou, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Accurate prediction of human crowd behavior presents a significant challenge with critical implications for autonomous systems. The core difficulty resides in developing a comprehensive computational framework capable of effectively modeling the spatial-temporal dynamics through three essential components: feature extraction, attention propagation, and predictive modeling. Current spatial-temporal graph convolutional networks (STGCNs), which typically employ single-hop neighborhood message passing with optional self-attention mechanisms, exhibit three fundamental limitations: restricted receptive fields due to being confined to limited propagation steps, poor topological extensibility, and structural inconsistencies between network components that collectively lead to suboptimal performance. To address these challenges, we establish the theoretical connection between graph convolutional networks and personalized propagation neural architectures, thereby proposing Attention Diffusion-Prediction Network (ADP-Net). This novel framework integrates three key innovations: (1) Consistent graph convolution layers with immediate attention mechanisms; (2) Multi-scale attention diffusion layers implementing graph diffusion convolution (GDC); and (3) Adaptive temporal convolution modules handling multi-timescale variations. The architecture employs polynomial approximation for GCN operations and implements an approximate personalized propagation scheme for GDC, enabling efficient multi-hop interaction modeling while maintaining structural consistency across spatial and temporal domains. Comprehensive experiments on standardized benchmarks (ETH/UCY and Stanford Drone Dataset) show cutting-edge results, with enhancements of 4% for the Average Displacement Error (ADE) and 26% for the Final Displacement Error (FDE) metrics when contrasted with prior approaches. This advancement provides a robust theoretical framework and practical implementation for crowd behavior modeling in autonomous systems.
Keywords: representation learning, Graph diffusion convolution, Trajectory prediction, Graph neural networks, spatio-temporalrelational modeling, Multi-hop, Personalized PageRank
Received: 22 Aug 2025; Accepted: 31 Oct 2025.
Copyright: © 2025 Zhang, Xiao and Yu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Shanlin  Xiao, xiaoshlin@mail.sysu.edu.cn
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
