汽车工程 ›› 2023, Vol. 45 ›› Issue (10): 1791-1802.doi: 10.19562/j.chinasae.qcgc.2023.10.002

所属专题: 智能网联汽车技术专题-控制2023年

• • 上一篇    下一篇

基于强化学习的城市场景多目标生态驾驶策略

李捷1,吴晓东1(),许敏1,刘永刚2   

  1. 1.上海交通大学机械与动力工程学院,上海  200240
    2.重庆大学,机械传动国家重点实验室,重庆  400044
  • 收稿日期:2023-02-28 修回日期:2023-03-28 出版日期:2023-10-25 发布日期:2023-10-23
  • 通讯作者: 吴晓东 E-mail:xiaodongwu@sjtu.edu.cn
  • 基金资助:
    国家重点研发计划(2018YFB0106000);国家自然科学基金(52172400)

Reinforcement Learning Based Multi-objective Eco-driving Strategy in Urban Scenarios

Jie Li1,Xiaodong Wu1(),Min Xu1,Yonggang Liu2   

  1. 1.School of Mechanical Engineering,Shanghai Jiao Tong University,Shanghai  200240
    2.Chongqing University,State Key Laboratory of Mechanical Transmission,Chongqing  400044
  • Received:2023-02-28 Revised:2023-03-28 Online:2023-10-25 Published:2023-10-23
  • Contact: Xiaodong Wu E-mail:xiaodongwu@sjtu.edu.cn

摘要:

为了提高智能网联汽车在复杂城市交通环境下的乘坐体验,本文提出一种基于深度强化学习的考虑驾驶安全、能耗经济性、舒适性和行驶效率的多目标生态驾驶策略。首先,基于马尔可夫决策过程构造了生态驾驶策略的状态空间、动作空间与多目标奖励函数。其次,设计了跟车安全模型与交通灯安全模型,为生态驾驶策略给出安全速度建议。第三,提出了融合安全约束与塑形函数的复合多目标奖励函数设计方法,保证强化学习智能体训练收敛和优化性能。最后,通过硬件在环实验验证所提方法的有效性。结果表明,所提策略可以在真实的车载控制器中实时应用。与基于智能驾驶员模型的生态驾驶策略相比,所提策略在满足驾驶安全约束的前提下,改善了车辆的能源经济性、乘坐舒适性和出行效率。

关键词: 智能网联汽车, 生态驾驶, 深度强化学习, 城市交通场景, 多目标优化

Abstract:

To improve the ride experience of connected and automated vehicle in complex urban traffic scenarios, this paper proposes a deep reinforcement learning based multi-objective eco-driving strategy that considers driving safety, energy economy, ride comfort, and travel efficiency. Firstly, the state space, action space, and multi-objective reward function of the eco-driving strategy are constructed based on the Markov decision process. Secondly, the car-following safety model and traffic light safety model are designed to provide safety speed suggestion for the eco-driving strategy. Thirdly, the composite multi-objective reward function design method that integrates safety constraints and shaping functions is proposed to ensure training convergence and optimization performance of the deep reinforcement learning agent. Finally, the effectiveness of the proposed method is verified through hardware-in-the-loop experiments. The results show that the proposed strategy can be applied in real-time on the onboard vehicle control unit. Compared to the eco-driving strategy based on the intelligent driver model, the proposed strategy improves energy economy, ride comfort, and travel efficiency of the vehicle while satisfying the driving safety constraints.

Key words: connected and automated vehicle, eco-driving, deep reinforcement learning, urban traffic scenario, multi-objective optimization