汽车工程 ›› 2025, Vol. 47 ›› Issue (9): 1674-1685.doi: 10.19562/j.chinasae.qcgc.2025.09.004

• • 上一篇    

融合鸟瞰图的端到端强化学习决策规划模型

汤白雪1,蔡英凤1(),陈龙1,王海2,饶中钰1,刘泽1   

  1. 1.江苏大学汽车工程研究院,镇江 212013
    2.江苏大学汽车与交通工程学院,镇江 212013
  • 收稿日期:2024-12-20 修回日期:2025-02-18 出版日期:2025-09-25 发布日期:2025-09-19
  • 通讯作者: 蔡英凤 E-mail:caicaixiao0304@126.com
  • 基金资助:
    国家自然科学基金(52225212);国家自然科学基金(52072160);国家重点研发计划项目(2022YFB2503302);江苏省重点研发项目(BE2020083-3)

End-to-End Decision-Making Model Based on Reinforcement Learning Incorporating Bird's Eye View Representation

Baixue Tang1,Yingfeng Cai1(),Long Chen1,Hai Wang2,Zhongyu Rao1,Ze Liu1   

  1. 1.Institute of Automotive Engineering,Jiangsu University,Zhenjiang 212013
    2.School of Automotive and Traffic Engineering,Jiangsu University,Zhenjiang 212013
  • Received:2024-12-20 Revised:2025-02-18 Online:2025-09-25 Published:2025-09-19
  • Contact: Yingfeng Cai E-mail:caicaixiao0304@126.com

摘要:

端到端自动驾驶决策规划模型是行业的热点研究方向,传感器信号与动作输出的空间、时序不统一以及端到端模型的收敛问题,极大制约了模型的实际应用效果。为此,本文提出一种融合鸟瞰图预测的端到端强化学习模型FB-Roach,通过鸟瞰图预测模型建立环境信息表征,设计了以静态查询表为核心的前向投影模块,以及融合时序信息、深度嵌入和语义嵌入的多任务后向投影模块,保证输入信号与输出动作的一致性;进一步结合注意力机制创新性地提出一种无循环的深度网络架构,有效融合鸟瞰图和车辆状态信息,并使用强化学习PPO算法优化模型的动作输出,实现自动驾驶车辆的智能决策控制。基于CARLA模拟器,构建了不同基准下多样性的量化评估指标。实验结果表明,所提出算法在模型收敛速度和驾驶决策安全性方面均优于目前主流算法。

关键词: 端到端自动驾驶, 鸟瞰图, 强化学习, 决策规划

Abstract:

End-to-end autonomous driving decision-making and planning models are a hot research direction in the industry. The spatial and temporal inconsistency between sensor signals and action outputs, as well as the convergence issues of end-to-end models, greatly limit the practical application effectiveness of these models. Therefore, in this paper an end-to-end reinforcement learning model called FB-Roach is proposed that integrates bird's-eye view prediction. Environmental information representation is established through a bird's-eye view prediction model. A forward projection module centered on a static Look-Up table, as well as a multi-task backward projection module that integrates temporal information, depth embedding, and semantic embedding, is designed to ensure the consistency between input signals and output actions. Furthermore, by innovatively incorporating the attention mechanism, the non-recurrent deep network architecture is proposed that effectively fuses bird's-eye view and vehicle state information. The model's action output is optimized using the PPO reinforcement learning algorithm to achieve intelligent decision-making and control for autonomous vehicles. Based on the CARLA simulator, a variety of quantitative evaluation indicators are constructed under different benchmarks. The experiments results show that the proposed algorithm outperforms current mainstream algorithms in terms of model convergence speed and driving decision safety.

Key words: end-to-end autonomous vehicles, BEV, reinforcement learning, decision-making