汽车工程 ›› 2022, Vol. 44 ›› Issue (7): 969-975.doi: 10.19562/j.chinasae.qcgc.2022.07.003

所属专题: 智能网联汽车技术专题-规划&控制2022年

• • 上一篇    下一篇

基于逆向强化学习的纵向自动驾驶决策方法

高振海,闫相同,高菲()   

  1. 吉林大学,汽车仿真与控制国家重点实验室,长春  130022
  • 收稿日期:2022-01-05 修回日期:2022-02-16 出版日期:2022-07-25 发布日期:2022-07-20
  • 通讯作者: 高菲 E-mail:gaofei123284123@jlu.edu.cn
  • 基金资助:
    国家重点研发计划项目(2017YFB0102601);国家自然科学基金(51775236)

A Decision-making Method for Longitudinal Autonomous Driving Based on Inverse Reinforcement Learning

Zhenhai Gao,Xiangtong Yan,Fei Gao()   

  1. Jilin University,State Key Laboratory of Automotive Simulation and Control,Changchun  130022
  • Received:2022-01-05 Revised:2022-02-16 Online:2022-07-25 Published:2022-07-20
  • Contact: Fei Gao E-mail:gaofei123284123@jlu.edu.cn

摘要:

基于人类驾驶员数据获得自动驾驶决策策略是当前自动驾驶技术研究的热点。经典的强化学习决策方法大多通过设计安全性、舒适性、经济性相关公式人为构建奖励函数,决策策略与人类驾驶员相比仍然存在较大差距。本文中使用最大边际逆向强化学习算法,将驾驶员驾驶数据作为专家演示数据,建立相应的奖励函数,并实现仿驾驶员的纵向自动驾驶决策。仿真测试结果表明:相比于强化学习方法,逆向强化学习方法的奖励函数从驾驶员的数据中自动化的提取,降低了奖励函数的建立难度,得到的决策策略与驾驶员的行为具有更高的一致性。

关键词: 自动驾驶, 决策算法, 强化学习, 逆向强化学习

Abstract:

Obtaining autonomous driving decision-making strategies by using human driver data is a hot spot in current research on autonomous driving technology. Most of the classic reinforcement learning decision-making methods artificially construct reward functions by designing formulas related to safety, comfort, and economy, which leads to a big gap between decision-making strategies and human drivers. This paper uses the maximum margin inverse reinforcement learning algorithm. Taking the driver’s driving data as expert demonstration data, a reward function is established, and the longitudinal autonomous driving decision-making by imitating the driver is realized. The simulation test results show that compared with the reinforcement learning method, the reward function of the inverse reinforcement learning method is automatically extracted from the driver's data, which reduces the difficulty of establishing the reward function, and the obtained decision-making strategy has higher consistency with the driver’s behavior.

Key words: autonomous driving, decision-making algorithm, reinforcement learning, inverse reinforcement learning(IRL)