基于分布式强化学习的车辆控制算法研究

doi:10.19562/j.chinasae.qcgc.2023.09.012

Abstract

Abstract:

The development of end-to-end autonomous driving algorithms has become a hot topic in current autonomous driving technology research and development. Classic reinforcement learning algorithms leverage information such as vehicle state and environmental feedback to train the vehicle for driving， through trial-and-error learning to obtain the best strategy， so as to achieve the development of end-to-end autonomous driving algorithms. However， there is still the problem of low development efficiency. The article proposes an asynchronous distributed reinforcement learning framework to address the inefficiency and high complexity problems in training RL algorithms in virtual simulation environment， establishes intra and inter process multi-agent parallel Soft Actor-Critic （SAC） distributed training framework on the Carla simulator to accelerate online RL training. Additionally， to achieve rapid model training and deployment， the article proposes a distributed model training and deployment system architecture based on Cloud-OTA， which mainly consists of an Over-the-Air Technology （OTA） platform， a cloud-based distributed training platform， and an on-vehicle computing platform. On this basis， the paper establishes an Autoware-Carla integrated validation framework based on ROS to improve model reusability and reduce migration and deployment cost. The experimental results show that compared with various mainstream autonomous driving methods， the method proposed in this paper has a faster training speed qualitatively， which can effectively cope with dense traffic flow and improve the adaptability of end-to-end autonomous driving strategies to unknown scenes， and reduce the time and resources required for experimentation in actual environment.

Key words: reinforcement learning, distributed system, multi-agent, autonomous driving, Carla, vehicle control

Weiguo Liu,Zhiyu Xiang,Weiping Liu,Daoxin Qi,Zixu Wang. Research on Vehicle Control Algorithm Based on Distributed Reinforcement Learning[J].Automotive Engineering, 2023, 45(9): 1637-1645.

Figures/Tables 14

符号	注释	数值
$R v$	默认速度奖励	1
$α v$	速度奖励系数	1
$R θ$	默认转向惩罚	-1
$α θ$	转向惩罚系数	0.1
$R δ$	默认车道偏离惩罚	-1
$α δ$	车道偏离惩罚系数	1
$R I$	默认违规处罚	-1
$α I$	违规处罚系数	250
$m a x ? o v f$	前方障碍物探测范围	10 m
$m a x ? o ? f$	前方障碍物探测半径	0.5 m
$m a x ? O v \ o v f$	其他障碍物探测范围	5 m
$m a x ? O ? \ o ? f$	其他障碍物探测半径	0.785 4 m

符号	注释	数值
$l r π$	全局策略的学习速率	4×10^-4
$l r q$	Q网络的学习速率	4×10^-4
$l r T$	平均步长	0.01
$α$	固定熵温	e
$b u f f e r ? s i z e$	全局缓冲池的大小	1 000 000
$b a t c h ? s i z e$	每个梯度更新的批量大小	512
$f r e q q$	Q网络更新频率	25
$f r e q t$	目标Q网络更新频率	25

References 17

1	张新钰，高洪波，赵建辉，等.基于深度学习的自动驾驶技术综述［J］.清华大学学报（自然科学版）， 2018， 58（4）： 438-444.
	ZHANG X Y， GAO H B， ZHAO J H， et al. Overview of deep learning intelligent driving methods［J］. Journal of Tsinghua University（Science and Technology）， 2018， 58（4）： 438-444.
2	冯洋，夏志龙，郭安，等.自动驾驶软件测试技术研究综述［J］.中国图象图形学报， 2021， 26（1）： 13-27.
	FENG Y， XIA Z L， GUO A， et al. Survey of testing techniques of autonomous driving software［J］. Journal of Image and Graphics， 2021， 26（1）： 13-27
3	许如晨. 基于深度强化学习的自动驾驶策略研究［D］.杭州：浙江大学， 2021.
	XU R C. Research on autonomous driving strategy based on deep reinforcement learning［D］. Hangzhou： Zhejiang University， 2021.
4	DOSOVITSKIY A， ROS G， CODEVILLA F， et al. Carla： an open urban driving simulator［C］.Conference on Robot Learning. PMLR， 2017： 1-16.
5	AGARWAL T， ARORA H， SCHNEIDER J. Affordance-based reinforcement learning for urban driving［J］. arXiv preprint arXiv：， 2021.
6	HORGAN D， QUAN J， BUDDEN D， et al. Distributed prioritized experience replay［J］. arXiv preprint arXiv：， 2018.
7	ESPEHOLT L， SOYER H， MUNOS R， et al. Impala： scalable distributed deep-RL with importance weighted actor-learner architectures［C］.International Conference on Machine Learning. PMLR， 2018： 1407-1416.
8	CODEVILLA F， SANTANA E， LÓPEZ A M， et al. Exploring the limitations of behavior cloning for autonomous driving［C］.Proceedings of the IEEE/CVF International Conference on Computer Vision， 2019： 9329-9338.
9	杨顺，蒋渊德，吴坚，等.基于多类型传感数据的自动驾驶深度强化学习方法［J］.吉林大学学报（工学版）， 2019， 49（4）： 1026-1033.
	YANG S，JIANG Y D， WU J， et al. Autonomous driving policy learning based on deep reinforcement learning and multi-type sensor data［J］. Journal of Jilin University（Engineering and Technology Edition）， 2019， 49（4）： 1026-1033.
10	HAARNOJA T， ZHOU A， ABBEEL P， et al. Soft actor-critic： off-policy maximum entropy deep reinforcement learning with a stochastic actor［C］.International Conference on Machine Learning. PMLR， 2018： 1861-1870.
11	POLYAK B T， JUDITSKY A B. Acceleration of stochastic approximation by averaging［J］. SIAM Journal on Control and Optimization， 1992， 30（4）： 838-855.
12	GENG J， LI D， CHENG Y， et al. HiPS： hierarchical parameter synchronization in large-scale distributed machine learning［C］.Proceedings of the 2018 Workshop on Network Meets AI & ML. 2018： 1-7.
13	WU X， LI X， LI J， et al. Caching transient content for IoT sensing： multi-agent soft actor-critic［J］. IEEE Transactions on Communications， 2021， 69（9）： 5886-5901.
14	CODEVILLA F， SANTANA E， LÓPEZ A M， et al. Exploring the limitations of behavior cloning for autonomous driving［C］.Proceedings of the IEEE/CVF International Conference on Computer Vision， 2019： 9329-9338.
15	XIAO Y， CODEVILLA F， GURRAM A， et al. Multimodal end-to-end autonomous driving［J］. IEEE Transactions on Intelligent Transportation Systems， 2020， 23（1）： 537-547.
16	ZHANG Z， ONG Y S， WANG D， et al. A collaborative multiagent reinforcement learning method based on policy gradient potential［J］. IEEE Transactions on Cybernetics， 2019， 51（2）： 1015-1027.
17	STEVIĆ S， KRUNIĆ M， DRAGOJEVIĆ M， et al. Development of ADAS perception applications in ROS and" Software-In-the-Loop" validation with Carla simulator［J］. Telfor Journal， 2020， 12（1）： 40-45.

[1]	HAO Han, WANG Si-南, LI Xiao, LIU Zong-Wei, ZHAO Fu-Quan. [J]. , 2017, 39(1): 1 -8 .
[2]	JIANG Ting, DAI Bing. [J]. , 2016, 38(12): 1459 -1466 .
[3]	SHEN Deng-Feng, WANG Chen, YU Hai-Sheng, ZHANG Tong, YI Xian-Ke. [J]. , 2017, 39(1): 15 -22 .
[4]	YUAN Zhi-Qun, GU Zheng-Qi, YANG Ming-Zhi, PENG Qian, LIU Xian-Gui. [J]. , 2017, 39(1): 28 -34 .
[5]	SHI Hai-Min, YU Xiao-Li, LU Guo-Dong, HUANG Yu-Qi, LIU Zhen-Tao, HUANG Rui. [J]. , 2017, 39(1): 102 -106 .
[6]	ZHENG Wei. [J]. , 2017, 39(1): 107 -112 .
[7]	ZHANG Guan-Jun, ZHAO Xin-Feng, CAO Li-Bo. [J]. , 2017, 39(2): 150 -158 .
[8]	HU Lin, DAI Xing-Xing, HUANG Jing, CHEN Qiang. [J]. , 2017, 39(2): 159 -167 .
[9]	CAO Li-Bo, HU Yuan, YAN Ling-Bo, PENG Yu, SHI Xiang-南. [J]. , 2017, 39(2): 174 -180 .
[10]	WANG Bin, LI Tie, ZHANG Xiao-Qing, GUO Tao, SHI Qi. [J]. , 2017, 39(3): 256 -261 .

项目	ARL	多智能体并行SAC
server个数		1	1	1	1	4
worker个数		1	8	8	8	8
agent个数		1	1	8	8	8
参数服务器		×	×	√	×	×
每1 h步数	16k	16k	120k	40k	350k	356k

[1]	Xiaocong Zhao,Shiyu Fang,Zirui Li,Jian Sun. Extraction and Application of Key Utility Term for Social Driving Interaction [J]. Automotive Engineering, 2024, 46(2): 230-240.
[2]	Yanli Ma, Qin Qin, Fangqi Dong, Yining Lou. Takeover Risk Assessment Model Based on Risk Field Theory Under Different Cognitive Secondary Tasks [J]. Automotive Engineering, 2024, 46(1): 9-17.
[3]	Zhaolin Li,Huawei Li,Sifa Zheng,Wenwei Wang,Guangcai Zou,Chuang Zhang,Ying Liu,Yongchang Zhang. Research Progress on Key Technologies of Basic Software and Hardware for Intelligent New Energy Vehicle Onboard Control [J]. Automotive Engineering, 2023, 45(9): 1530-1542.
[4]	Ming Wang,Xiaolin Tang,Kai Yang,Guofa Li,Xiaosong Hu. A Motion Planning Method for Autonomous Vehicles Considering Prediction Risk [J]. Automotive Engineering, 2023, 45(8): 1362-1372.
[5]	Dongyu Zhao, Shuen Zhao. Autonomous Driving 3D Object Detection Based on Cascade YOLOv7 [J]. Automotive Engineering, 2023, 45(7): 1112-1122.
[6]	Jiahao Zhao,Zhiquan Qi,Zhifeng Qi,Hao Wang,Lei He. Calculation of Heading Angle of Parallel Large Vehicle Based on Tire Feature Points [J]. Automotive Engineering, 2023, 45(6): 1031-1039.
[7]	Lisheng Jin,Guangde Han,Xianyi Xie,Baicang Guo,Guofeng Liu,Wentao Zhu. Review of Autonomous Driving Decision-Making Research Based on Reinforcement Learning [J]. Automotive Engineering, 2023, 45(4): 527-540.
[8]	Zhongqiang Wu,Changxing Zhang. Distributed Charging Control of Electric Vehicles Considering Distribution Grid Load [J]. Automotive Engineering, 2023, 45(4): 598-608.
[9]	Qingyang Huang,Xiaoping Jin,Yikang Zhang. Analysis of Drivers' Driving Posture Change Rule Under the Condition of Automatic Driving Level Improvement [J]. Automotive Engineering, 2023, 45(3): 382-392.
[10]	Zhengfa Liu,Ya Wu,Peigen Liu,Rongqi Gu,Guang Chen. Cross-Domain Object Detection for Intelligent Driving Based on Joint Distribution Matching of Features and Labels [J]. Automotive Engineering, 2023, 45(11): 2082-2091.
[11]	Jie Li,Xiaodong Wu,Min Xu,Yonggang Liu. Reinforcement Learning Based Multi-objective Eco-driving Strategy in Urban Scenarios [J]. Automotive Engineering, 2023, 45(10): 1791-1802.
[12]	Chunyang Qi,Chuanxue Song,Shixin Song,Liqiang Jin,Da Wang,Feng Xiao. Research on Energy Management Strategy for Hybrid Electric Vehicles Based on Inverse Reinforcement Learning [J]. Automotive Engineering, 2023, 45(10): 1954-1964.
[13]	Pulei Xu,Yingfeng Cai,Yubo Lian,Xiaoqiang Sun,Hai Wang,Long Chen,Yilin Zhong. AFS/DYC Coordinated Control of Intelligent Vehicles Based on Improved Hierarchical Extensibility Theory [J]. Automotive Engineering, 2023, 45(1): 20-31.
[14]	Runhui Huang,Likun Hu,Mingfang Su,Daye Xu,Aoran Chen. Semantic Segmentation Method of LiDAR Point Cloud Based on 3D Conical Grid [J]. Automotive Engineering, 2022, 44(8): 1173-1182.
[15]	Zhenhai Gao,Xiangtong Yan,Fei Gao. A Decision-making Method for Longitudinal Autonomous Driving Based on Inverse Reinforcement Learning [J]. Automotive Engineering, 2022, 44(7): 969-975.

Research on Vehicle Control Algorithm Based on Distributed Reinforcement Learning

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 17

Related Articles 15

Metrics

Comments

Recommended 10