Hierarchical Vector Transformer (HiVT) for Multi-Agent Motion Prediction
HiVT, a method for predicting multi-agent motion, utilizes a hierarchical vector transformer approach. Traditional vector methods excel in capturing complex interactions in traffic scenarios. However, existing motion prediction algorithms often overlook the symmetry of the problem, resulting in high computational complexity and difficulty in lossless, real-time online prediction.
The HiVT method decomposes the multi-agent motion prediction problem into two parts: a local information extraction model and a global information interaction model. This design allows for the aggregation of information at various scales, effectively and efficiently modeling the large number of agents in a scene. The HiVT model also features modules for spatially and rotationally invariant scene representation and learning, enhancing its ability to extract robust representations that are insensitive to input shifts and rotations.
HiVT models the relationships between entities in stages, using relative positions to represent all vectorized entities. It comprises four key components: Scene Representation, Local Encoder, Global Interaction, and Multimodal Future Decoder. Scene Representation involves extracting vectorized entities from the scene, including agent trajectories and map lane segments, using relative coordinates from a central vehicle. Local Encoder focuses on agent interactions, temporal dependencies, and the relationship between agents and lanes. Global Interaction bridges the different local pieces of information. The Multimodal Future Decoder models the multi-modal nature of vehicle motion, predicting future trajectories for all agents.
During the training phase, the model calculates errors for each agent under each trajectory, resulting in an error matrix. For each agent, the model selects the trajectory with the smallest error to calculate the loss. The loss function is formulated using the Laplace probability density function, with the goal of minimizing the loss when the true value aligns with the predicted position. Additionally, a cross-entropy loss is used for classification, with the selected trajectory having a value of 1, and other trajectories having a value of 0.
Experimental results show that the HiVT model outperforms other models in terms of parameter efficiency, inference speed, and prediction accuracy, as demonstrated through ablation experiments. This makes HiVT a valuable and efficient approach for multi-agent motion prediction in complex traffic scenarios.
本文地址: http://www.goggeous.com/20241227/1/933072
文章来源:天狐定制
版权声明:除非特别标注,否则均为本站原创文章,转载时请以链接形式注明文章出处。
2025-01-08职业培训
2025-01-08职业培训
2025-01-08职业培训
2025-01-08职业培训
2025-01-08职业培训
2025-01-08职业培训
2025-01-08职业培训
2025-01-08职业培训
2025-01-08职业培训
2025-01-08职业培训
2024-12-27 18:57:56职业培训
2024-12-27 18:57:55职业培训
2024-12-27 18:57:55职业培训
2024-12-27 18:57:54职业培训
2024-12-27 18:57:46职业培训
2024-12-27 18:57:45职业培训
2024-12-27 18:57:45职业培训
2024-12-27 18:57:44职业培训
2024-12-27 18:57:44职业培训
2024-12-27 18:57:43职业培训
2025-01-01 15:59职业培训
2024-12-23 20:43职业培训
2024-11-26 04:37职业培训
2024-11-27 08:12职业培训
2024-12-04 14:04职业培训
2025-01-07 23:22职业培训
2024-12-03 15:02职业培训
2024-11-27 06:05职业培训
2024-12-27 13:04职业培训
2024-12-06 19:14职业培训
扫码二维码
获取最新动态