汽车工程 ›› 2024, Vol. 46 ›› Issue (6): 945-955.doi: 10.19562/j.chinasae.qcgc.2024.06.001

• •    下一篇

依托多风格强化学习的车辆轨迹跟踪避撞控制

肖礼明1,张发旺2,陈良发1,闫昊琪1,马飞1,李升波3,段京良1()   

  1. 1.北京科技大学机械工程学院,北京 100083
    2.北京理工大学机械与车辆学院,北京 100081
    3.清华大学车辆与运载学院,北京 100084
  • 收稿日期:2023-12-13 修回日期:2024-01-12 出版日期:2024-06-25 发布日期:2024-06-19
  • 通讯作者: 段京良 E-mail:duanjl@ustb.edu.cn
  • 基金资助:
    国家自然科学基金(52202487);中央高校基本科研业务费专项资金项目(FRF-OT-23-02)

Vehicle Trajectory Tracking and Collision Avoidance Control Based on Multi-style Reinforcement Learning

Liming Xiao1,Fawang Zhang2,Liangfa Chen1,Haoqi Yan1,Fei Ma1,Shengbo Eben Li3,Jingliang Duan1()   

  1. 1.School of Mechanical Engineering,University of Science and Technology Beijing,Beijing 100083
    2.School of Mechanical Engineering,Beijing Institute of Technology,Beijing 100081
    3.School of Vehicle and Mobility,Tsinghua University,Beijing 100084
  • Received:2023-12-13 Revised:2024-01-12 Online:2024-06-25 Published:2024-06-19
  • Contact: Jingliang Duan E-mail:duanjl@ustb.edu.cn

摘要:

轨迹跟踪避撞是车辆智能性的重要体现,针对现有控制方法面对同一场景的控制风格单一问题,本文中提出了一种多风格型强化学习控制方法。为实现控制风格多样性,首次将风格指标引入值网络和策略网络,搭建了多风格跟踪避撞策略网络,并结合值分布强化学习理论构建了多风格策略迭代框架,依托该框架推导提出了多风格值分布强化学习算法。仿真和实车试验表明:所提出方法可以多种驾驶风格(激进、中性、保守)完成轨迹跟踪避撞任务,实车稳态轨迹跟踪误差小于5 cm,具备较高的控制精度,实车平均单步决策耗时仅为6.07 ms,满足实时性要求。

关键词: 多风格, 值分布强化学习, 轨迹跟踪, 主动避撞

Abstract:

Trajectory tracking and collision avoidance are key functions of vehicle intelligence. For the singular control style limitation of existing control methods in the same scene, a novel multi-style reinforcement learning (RL) method is proposed in this paper. To achieve diversity in control styles, style indicators are innovatively incorporated into value and policy networks to establish a multi-style tracking and collision avoidance policy network. Alongside this, a multi-style policy iteration framework is developed combining the distributional RL theory. Based on the framework, a multi-style distributional soft actor-critic algorithm (M-DSAC) is put forward. Through simulation and real vehicle tests, it is validated that the proposed method is capable of executing trajectory tracking and collision avoidance tasks across various driving styles, such as aggressive, neutral, and conservative, with the real vehicle’s steady-state trajectory tracking error less than 5 cm, with high control accuracy. The average single-step decision-making time for the real vehicle is merely 6.07 ms, meeting real-time requirements.

Key words: multi-style, DSAC, trajectory tracking, active collision avoidance