汽车工程 ›› 2023, Vol. 45 ›› Issue (9): 1637-1645.doi: 10.19562/j.chinasae.qcgc.2023.09.012

所属专题: 智能网联汽车技术专题-控制2023年

• • 上一篇    下一篇

基于分布式强化学习的车辆控制算法研究

刘卫国1,2(),项志宇1,刘伟平2,齐道新2,王子旭2   

  1. 1.浙江大学信息与电子工程学院,杭州  310058
    2.国家智能网联汽车创新中心,北京  100160
  • 收稿日期:2023-04-18 修回日期:2023-06-23 出版日期:2023-09-25 发布日期:2023-09-23
  • 通讯作者: 刘卫国 E-mail:liuweiguo@china-icv.cn
  • 基金资助:
    自动驾驶国家新一代人工智能开放创新平台项目(2020AAA0103702)

Research on Vehicle Control Algorithm Based on Distributed Reinforcement Learning

Weiguo Liu1,2(),Zhiyu Xiang1,Weiping Liu2,Daoxin Qi2,Zixu Wang2   

  1. 1.School of Information and Electronic Engineering,Zhejiang University,Hangzhou  310058
    2.National Innovation Center of Intelligent and Connected Vehicles,Beijing  100160
  • Received:2023-04-18 Revised:2023-06-23 Online:2023-09-25 Published:2023-09-23
  • Contact: Weiguo Liu E-mail:liuweiguo@china-icv.cn

摘要:

端到端自动驾驶算法的开发现已成为当前自动驾驶技术研发的热点。经典的强化学习算法利用车辆状态、环境反馈等信息训练车辆行驶,通过试错学习获得最佳策略,实现了端到端的自动驾驶算法开发,但仍存在开发效率低下的问题。为解决虚拟仿真环境下训练强化学习算法的低效性和高复杂度问题,本文提出了一种异步分布式强化学习框架,并建立了进程间和进程内的多智能体并行柔性动作-评价(soft actor-critic, SAC)分布式训练框架,加速了Carla模拟器上的在线强化学习训练。同时,为进一步实现模型的快速训练和部署,本文提出了一种基于Cloud-OTA的分布式模型快速训练和部署系统架构,系统框架主要由空中下载技术(over-the-air technology, OTA)平台、云分布式训练平台和车端计算平台组成。在此基础上,本文为了提高模型的可复用性并降低迁移部署成本,搭建了基于ROS的Autoware-Carla集成验证框架。实验结果表明,本文方法与多种主流自动驾驶方法定性相比训练速度更快,能有效地应对密集交通流道路工况,提高了端到端自动驾驶策略对未知场景的适应性,减少在实际环境中进行实验所需的时间和资源。

关键词: 强化学习, 分布式, 多智能体, 自动驾驶, Carla, 车辆控制

Abstract:

The development of end-to-end autonomous driving algorithms has become a hot topic in current autonomous driving technology research and development. Classic reinforcement learning algorithms leverage information such as vehicle state and environmental feedback to train the vehicle for driving, through trial-and-error learning to obtain the best strategy, so as to achieve the development of end-to-end autonomous driving algorithms. However, there is still the problem of low development efficiency. The article proposes an asynchronous distributed reinforcement learning framework to address the inefficiency and high complexity problems in training RL algorithms in virtual simulation environment, establishes intra and inter process multi-agent parallel Soft Actor-Critic (SAC) distributed training framework on the Carla simulator to accelerate online RL training. Additionally, to achieve rapid model training and deployment, the article proposes a distributed model training and deployment system architecture based on Cloud-OTA, which mainly consists of an Over-the-Air Technology (OTA) platform, a cloud-based distributed training platform, and an on-vehicle computing platform. On this basis, the paper establishes an Autoware-Carla integrated validation framework based on ROS to improve model reusability and reduce migration and deployment cost. The experimental results show that compared with various mainstream autonomous driving methods, the method proposed in this paper has a faster training speed qualitatively, which can effectively cope with dense traffic flow and improve the adaptability of end-to-end autonomous driving strategies to unknown scenes, and reduce the time and resources required for experimentation in actual environment.

Key words: reinforcement learning, distributed system, multi-agent, autonomous driving, Carla, vehicle control