基于模仿学习和强化学习的智能车辆换道行为决策

doi:10.19562/j.chinasae.qcgc.2021.01.008

摘要/Abstract

摘要：

本文中提出了一种基于模仿学习和强化学习的智能车辆换道行为决策方法。其中宏观决策模块通过模仿学习构建极端梯度提升模型，根据输入信息在车道保持、左换道和右换道中选择宏观决策指令，以此确定所需求解的换道行为决策子问题；各细化决策子模块通过深度确定性策略梯度强化学习方法得到优化策略，求解相应换道行为决策子问题，以确定车辆运动目标位置并下发执行。仿真结果表明：本文中提出方法的策略学习速度比单纯强化学习方法快，且其综合性能优于有限状态机、行为克隆模仿学习和单纯强化学习等方法。

关键词: 智能车辆, 行为决策, 强化学习, 模仿学习

Abstract:

A lane?change behavior decision?making method of the intelligent vehicle is proposed based on imitation learning and reinforcement learning， in which the macro decision?making module constructs the extreme gradient boosting model through imitation learning， and selects the macro instructions （lane?keeping， left lane?change and right lane?change） according to the input information， so as to determine the sub?problem of lane?change behavior decision that need to be solved. Each detailed decision?making sub?module acquires its optimized strategy through the reinforcement learning of deep deterministic strategy gradient to solve the corresponding sub?problem for determining the movement target position of ego?vehicle and sending to lower?level modules for execution. Simulation results show that the strategy learning speed of the proposed method is faster than that of pure reinforcement learning， and its comprehensive performance is better than that of finite state machine， behavior clone imitation learning and pure reinforcement learning.

Key words: intelligent vehicle, behavior decision?making, reinforcement learning, imitation learning

宋晓琳,盛鑫,曹昊天,李明俊,易滨林,黄智. 基于模仿学习和强化学习的智能车辆换道行为决策[J]. 汽车工程, 2021, 43(1): 59-67.

Xiaolin Song,Xin Sheng,Haotian Cao,Mingjun Li,Binlin Huang Zhi Yi. Lane‑change Behavior Decision‑making of Intelligent Vehicle Based on Imitation Learning and Reinforcement Learning[J]. Automotive Engineering, 2021, 43(1): 59-67.

图/表 14

图 1

图 2

图 3

图 4

图 5

表1

图 6

图 7

表2

表3

DDPG训练参数设置"

名称	描述	取值
training episodes	训练轮次数	5 000
γ	折扣系数	0.99
$α Q$	评论家网络学习率	10^-4
$α μ$	演员网络学习率	10^-5
batch size	采样经历样本数	64
replay memory size	经验回放库容量	8 000

表3

图 8

图 9

表4

图 10

参考文献 21

1	PADEN B， ČÁP M， YONG S Z， et al. A survey of motion planning and control techniques for self⁃driving urban vehicles［J］. IEEE Transactions on Intelligent Vehicles， 2016， 1（1）： 33-55.
2	冀杰，黄岩军，李云伍，等.基于有限状态机的车辆自动驾驶行为决策分析［J］.汽车技术，2018（12）：1-7.
	JI J， HUANG Y J， LI Y W，et al.Decision making analysis of autonomous driving behaviors for intelligent vehicles based on finite state machine［J］. Automobile Technology，2018（12）：1-7.
3	KURT A， ÜÖZGÜNER. Hierarchical finite state machines for autonomous mobile systems［J］. Control Engineering Practice， 2013， 21（2）： 184-194.
4	WANG M， HOOGENDOORN S P， DAAMEN W， et al. Game theoretic approach for predictive lane⁃changing and car⁃following control［J］. Transportation Research Part C： Emerging Technologies， 2015， 58： 73-92.
5	YU H， TSENG H E， LANGARI R. A human⁃like game theory-based controller for automatic lane changing［J］. Transportation Research Part C： Emerging Technologies， 2018， 88： 140-158.
6	BOJARSKI M， DDEL TESTA， DWORAKOWSKI D， et al. End to end learning for self⁃driving cars［J］. arXiv Preprint arXiv： 1604.07316， 2016.
7	CODEVILLA F， MIILLER M， LÓPEZ A， et al. End⁃to⁃end driving via conditional imitation learning［C］. 2018 IEEE International Conference on Robotics and Automation （ICRA）. IEEE， 2018： 1-9.
8	KUEFLER A， MORTON J， WHEELER T， et al. Imitating driver behavior with generative adversarial networks［C］. 2017 IEEE Intelligent Vehicles Symposium （IV）. IEEE， 2017： 204-211.
9	HESSEL M， MODAYIL J， HVAN HASSELT， et al. Rainbow： combining improvements in deep reinforcement learning［C］. Thirty⁃Second AAAI Conference on Artificial Intelligence， 2018.
10	HAARNOJA T， ZHOU A， ABBEEL P， et al. Soft actor⁃critic： offp⁃olicy maximum entropy deep reinforcement learning with a stochastic actor［C］. International Conference on Machine Learning. 2018： 1861-1870.
11	朱冰，蒋渊德，赵健，等.基于深度强化学习的车辆跟驰控制［J］.中国公路学报，2019，32（6）：53-60.
	ZHU B， JIANG Y D， ZHAO J， et al.A car⁃following control algorithm based on deep reinforcement learning［J］. China Journal of Highway and Transport， 2019，32（6）：53-60.
12	MIRCHEVSKA B， PEK C， WERLING M， et al. High⁃level decision making for safe and reasonable autonomous lane changing using reinforcement learning［C］. 2018 21st International Conference on Intelligent Transportation Systems （ITSC）. IEEE， 2018： 2156-2162.
13	WANG P， CHAN C Y， DE LA FORTELLE A. A reinforcement learning based approach for automated lane change maneuvers［C］. 2018 IEEE Intelligent Vehicles Symposium （IV）. IEEE， 2018： 1379-1384.
14	CHEN T， GUESTRIN C. Xgboost： a scalable tree boosting system［C］. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM， 2016： 785-794.
15	LILLICRAP T P， HUNT J J， PRITZEL A， et al. Continuous control with deep reinforcement learning［J］. arXiv Preprint arXiv： 1509.02971， 2015.
16	LIN L J. Reinforcement learning for robots using neural networks［D］. Pittsburgh： Carnegie Mellon University， 1993.
17	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Playing atari with deep reinforcement learning［J］. arXiv Preprint arXiv： 1312.5602， 2013.
18	VOGEL K. A comparison of headway and time to collision as safety indicators［J］. Accident Analysis & Prevention， 2003， 35（3）： 427-433.
19	ZHANG M， LI N， GIRARD A， et al. A finite state machine based automated driving controller and its stochastic optimization［C］. ASME 2017 Dynamic Systems and Control Conference. American Society of Mechanical Engineers Digital Collection， 2017.
20	NAGESHRAO S， TSENG H E， FILEV D. Autonomous highway driving using deep reinforcement learning［C］.2019 IEEE International Conference on Systems， Man and Cybernetics （SMC）. IEEE， 2019： 2326-2331.
21	姜岩，龚建伟，熊光明，等.基于运动微分约束的无人车辆纵横向协同规划算法的研究［J］.自动化学报，2013，39（12）：2012-2020.
	JIANG Y， GONG J W， XIONG G M， et al. Research on differential Constraints⁃based planning algorithm for autonomous⁃driving vehicles［J］. Acta Automatica Sinica，2013，39（12）：2012-2020.

参数	描述	取值
K	基学习器数量	120
ε	学习率	0.1
max_depth	最大树深度	6
min_child_weight	最小叶子样本权重和	5
gamma	节点分裂增益阈值	0
subsample	采样比例	0.8
colsample	列采样比例	0.8

网络	层	维度	激活函数
	全连接层1	(22,400)	ReLU
评论家网络	全连接层2	(400,200)	ReLU
	全连接层3	(200,1)	ReLU
	全连接层1	(23,200)	ReLU
演员网络	全连接层2	(200,100)	ReLU
	全连接层3	（100，1）	ReLU

方法	D_FSM	D_IL	D_RL	D_IRL
碰撞次数	0	3	0	0
平均车速/（km·h^-1）	76.8	81.3	80.7	81.2
车速标准差/（km·h^-1）	10.3	5.8	6.5	5.6
平均单轮换道数	6	9	12	8

[1]	付新科,蔡英凤,陈龙,王海,刘擎超. 不确定性环境下的自动驾驶汽车行为决策方法[J]. 汽车工程, 2024, 46(2): 211-221.
[2]	刘卫国,项志宇,刘伟平,齐道新,王子旭. 基于分布式强化学习的车辆控制算法研究[J]. 汽车工程, 2023, 45(9): 1637-1645.
[3]	金立生,韩广德,谢宪毅,郭柏苍,刘国峰,朱文涛. 基于强化学习的自动驾驶决策研究综述[J]. 汽车工程, 2023, 45(4): 527-540.
[4]	吕彦直,魏超,何元浩. 基于GCN和CIL的端到端自动驾驶换道方法[J]. 汽车工程, 2023, 45(12): 2310-2317.
[5]	李捷,吴晓东,许敏,刘永刚. 基于强化学习的城市场景多目标生态驾驶策略[J]. 汽车工程, 2023, 45(10): 1791-1802.
[6]	齐春阳,宋传学,宋世欣,靳立强,王达,肖峰. 基于逆强化学习的混合动力汽车能量管理策略研究[J]. 汽车工程, 2023, 45(10): 1954-1964.
[7]	高振海,闫相同,高菲. 基于逆向强化学习的纵向自动驾驶决策方法[J]. 汽车工程, 2022, 44(7): 969-975.
[8]	李江坤,邓伟文,任秉韬,王文奇,丁娟. 基于场景动力学和强化学习的自动驾驶边缘测试场景生成方法[J]. 汽车工程, 2022, 44(7): 976-986.
[9]	宋东鉴,朱冰,赵健,韩嘉懿,刘彦辰. 基于驾驶行为生成机制的智能汽车类人行为决策[J]. 汽车工程, 2022, 44(12): 1797-1808.
[10]	黄圣杰,胡满江,周云水,殷周平,秦晓辉,边有钢,贾倩倩. 动态场景下基于语义分割与运动一致性约束的车辆视觉SLAM[J]. 汽车工程, 2022, 44(10): 1503-1510.
[11]	杨宁康,韩立金,刘辉,张欣. 基于效率优化的混合动力车辆强化学习能量管理策略研究[J]. 汽车工程, 2021, 43(7): 1046-1056.
[12]	王安杰,郑玲,李以农,王戡. 基于预测风险场的智能汽车主动避撞运动规划[J]. 汽车工程, 2021, 43(7): 1096-1104.
[13]	张昊,范钦灏,王巍,黄晋,王志. 基于强化学习的多燃烧模式混合动力能量管理策略[J]. 汽车工程, 2021, 43(5): 683-691.
[14]	郭景华,李文昌,罗禹贡,陈涛,李克强. 基于深度强化学习的驾驶员跟车模型研究[J]. 汽车工程, 2021, 43(4): 571-579.
[15]	高振海,闫相同,高菲,孙天骏. 仿驾驶员DDPG汽车纵向自动驾驶决策方法[J]. 汽车工程, 2021, 43(12): 1737-1744.