汽车工程 ›› 2025, Vol. 47 ›› Issue (10): 1861-1871.doi: 10.19562/j.chinasae.qcgc.2025.10.002

• • 上一篇    

基于动态注意目标导向的车辆轨迹预测方法

韩悦1,蔡英凤1(),陈龙1,孙晓强1,王海2,刘泽1,饶中钰1   

  1. 1.江苏大学汽车工程研究院,镇江 212013
    2.江苏大学汽车与交通工程学院,镇江 212013
  • 收稿日期:2024-11-20 修回日期:2025-03-03 出版日期:2025-10-25 发布日期:2025-10-20
  • 通讯作者: 蔡英凤 E-mail:caicaixiao0304@126.com
  • 基金资助:
    国家重点研发计划项目(2022YFB2503302);国家自然科学基金(52225212);国家自然科学基金(52272418);国家自然科学基金(U22A20100)

Vehicle Trajectory Prediction Method Based on Dynamic Attention and Goal-Guided Mechanism

Yue Han1,Yingfeng Cai1(),Long Chen1,Xiaoqiang Sun1,Hai Wang2,Ze Liu1,Zhongyu Rao1   

  1. 1.Institute of Automotive Engineering,Jiangsu University,Zhenjiang 212013
    2.School of Automotive and Traffic Engineering,Jiangsu University,Zhenjiang 212013
  • Received:2024-11-20 Revised:2025-03-03 Online:2025-10-25 Published:2025-10-20
  • Contact: Yingfeng Cai E-mail:caicaixiao0304@126.com

摘要:

在复杂的交通场景中,可靠且有效地预测周边车辆的轨迹对于自动驾驶汽车的安全运行至关重要。然而,现有的预测方法往往面临高计算开销的挑战,难以在不牺牲精度的情况下实现实时高效的轨迹预测。为此,本文提出了一种结合动态注意力和目标引导的创新方法DAGG,该方法可以精确捕捉动态场景的变化并捕获终点目标。针对连续预测中存在的冗余编码问题和推理延迟,本文构建了一个本地时空参照框架,用于解耦场景实例之间的内在特征和相对信息。进一步设计了高效且紧凑的三重因子化注意力融合模块,用于聚合本地上下文特征,从而捕获场景中丰富的时空背景信息。为了实现多模态预测并更好地利用场景编码,本文将场景融合特征注入地图信息,并采用多模态运动预测解码模块来指导目标选择,以捕获高质量的预测目标,同时降低基于目标轨迹生成的计算成本。在公开数据集Argoverse上的验证结果表示,本文所提的方法最小平均位移误差为0.84 m,最小最终位移误差为1.26 m,显著优于主流基线模型,展现了其在复杂动态场景中的优异预测能力。

关键词: 智能汽车, 轨迹预测, 深度学习, 注意力机制

Abstract:

In complex traffic scenarios, reliably and effectively predicting the trajectories of surrounding vehicles is crucial for the safe operation of autonomous vehicles. However, existing prediction methods often face challenges related to high computational overhead, making it difficult to achieve real-time and efficient trajectory prediction without sacrificing accuracy. Therefore, an innovative method called Dynamic Attention and Goal Guidance (DAGG) combining dynamic attention and goal guidance is proposed, which accurately captures the dynamics of changing scenes and identifies endpoint goals. To reduce redundant encoding and reasoning delay in continuous prediction, a local spatiotemporal reference framework is constructed that decouples intrinsic features from relative information between scene instances. Furthermore, an efficient and compact triple-factor attention fusion module is designed to aggregate local context features, capturing rich spatiotemporal background information. To achieve multimodal prediction and better utilize scene encoding, scene fusion features are injected into map information and a multimodal motion prediction decoding module is adopted to guide goal selection, capturing high-quality predicted goals while reducing the computational cost of goal-based trajectory generation. The validation results on the publicly available Argoverse dataset demonstrate that the proposed method achieves a minimum average displacement error (minADE) of 0.84 m and a minimum final displacement error (minFDE) of 1.26 m, significantly outperforming mainstream baseline models, which highlights its superior predictive capability in complex and dynamic scenarios.

Key words: autonomous driving, trajectory prediction, deep learning, attention mechanism