汽车工程

• 论文 • 上一篇    下一篇

基于混合专家模型的智能网联汽车换道决策方法

速公路换道决策问题场景复杂、不确定性强、实时性要求高,是国内外自动驾驶领域的研究热点和难点。深度强化学习(deep reinforcement learning,DRL)具有良好的决策实时性和面向复杂场景的适应性,然而,在训练样本与成本有限的条件下学习效果有限,其难以保证最优的驾驶效率和完全的行驶安全性。本文提出了一种基于改进DRL的混合专家模型(DRL-mixture of expert,DRL-MOE)换道决策方法。首先,模型的上层分类器根据输入状态特征动态地决定下层DRL专家或启发式专家的激活状态。为提高DRL专家的学习效果,本方法通过行为克隆(behavior cloning,BC)对神经网络参数进行初始化,对传统深度确定性策略梯度算法(deep deterministic policy gra‐dient,DDPG)进行了改进。将智能驾驶员模型和最小化换道引起的总制动策略设计为启发式专家,以确保行驶安全性。仿真结果表明,本文所提出的 DRL-MOE 模型方法与非混合专家型 DRL 方法相比,在驾驶效率方面提高了15.04%,并确保了零碰撞和零出界,具有较高的鲁棒性和更优的效果。   

  1. 1.北京理工大学机械与车辆学院,北京100081;2.深圳市昊岳科技有限公司,深圳518000;3.北京理工大学深圳汽车研究院,深圳518122;4. 北京航空航天大学前沿科学技术创新研究院,北京 100191)

A Lane Change Decision Method for Intelligent Connected Vehicles Based onMixture of Expert Model

The problem of lane-changing decision-making on highways,characterized by complex scenari‐os,strong uncertainty,and high real-time requirements,is a research hotspot and challenge in the field of autono‐mous driving both domestically and internationally. Deep Reinforcement Learning (DRL) exhibits excellent real-time decision-making capabilities and adaptability to complex scenarios. However,under the constraints of limited training samples and cost,its learning effectiveness remains limited,making it difficult to ensure optimal driving effi‐ciency and complete driving safety. In this paper, a DRL-Mixture of Expert (DRL-MOE) lane-changing decision-making method based on the improved DRL model is proposed. Firstly,the upper-level classifier dynamically deter‐mines the activation status of the lower-level DRL expert or heuristic expert based on the input state features. Then, to enhance the learning effectiveness of the DRL expert,the method utilizes Behavior Cloning (BC) for initializing the neural network parameters to make improvements on the traditional Deep Deterministic Policy Gradient (DDPG) algorithm. Finally, the Intelligent Driver Model (IDM) and the strategy of Minimizing Overall Braking In‐duced by Lane changes (MOBIL) are designed as heuristic experts to ensure driving safety. The simulation results show that compared to non-mixed expert DRL methods,the proposed DRL-MOE model improves driving efficiency by 15.04%,ensuring zero collisions and zero departures,demonstrating higher robustness and superior performance.   

  1. 1.School of Mechanical Engineering,Beijing Institute of Technology,Beijing 100081; 2.ShenZhen Boundless Sensor Technology Co. ,Ltd. ,Shenzhen 518000; 3.Shenzhen Automotive Research Institute of Beijing Institute of Technology,Shenzhen 518122; 4.School of Transportation Science and Engineering,Beihang University,Beijing 100191

摘要: 速公路换道决策问题场景复杂、不确定性强、实时性要求高,是国内外自动驾驶领域的研究热点和难点。深度强化学习(deep reinforcement learning,DRL)具有良好的决策实时性和面向复杂场景的适应性,然而,在训练样本与成本有限的条件下学习效果有限,其难以保证最优的驾驶效率和完全的行驶安全性。本文提出了一种基于改进DRL的混合专家模型(DRL-mixture of expert,DRL-MOE)换道决策方法。首先,模型的上层分类器根据输入状态特征动态地决定下层DRL专家或启发式专家的激活状态。为提高DRL专家的学习效果,本方法通过行为克隆(behavior cloning,BC)对神经网络参数进行初始化,对传统深度确定性策略梯度算法(deep deterministic policy gra‐dient,DDPG)进行了改进。将智能驾驶员模型和最小化换道引起的总制动策略设计为启发式专家,以确保行驶安全性。仿真结果表明,本文所提出的 DRL-MOE 模型方法与非混合专家型 DRL 方法相比,在驾驶效率方面提高了15.04%,并确保了零碰撞和零出界,具有较高的鲁棒性和更优的效果。

关键词: 自动驾驶, 高速换道决策, 深度强化学习, 混合专家模型

Abstract: The problem of lane-changing decision-making on highways,characterized by complex scenari‐os,strong uncertainty,and high real-time requirements,is a research hotspot and challenge in the field of autono‐mous driving both domestically and internationally. Deep Reinforcement Learning (DRL) exhibits excellent real-time decision-making capabilities and adaptability to complex scenarios. However,under the constraints of limited training samples and cost,its learning effectiveness remains limited,making it difficult to ensure optimal driving effi‐ciency and complete driving safety. In this paper, a DRL-Mixture of Expert (DRL-MOE) lane-changing decision-making method based on the improved DRL model is proposed. Firstly,the upper-level classifier dynamically deter‐mines the activation status of the lower-level DRL expert or heuristic expert based on the input state features. Then, to enhance the learning effectiveness of the DRL expert,the method utilizes Behavior Cloning (BC) for initializing the neural network parameters to make improvements on the traditional Deep Deterministic Policy Gradient (DDPG) algorithm. Finally, the Intelligent Driver Model (IDM) and the strategy of Minimizing Overall Braking In‐duced by Lane changes (MOBIL) are designed as heuristic experts to ensure driving safety. The simulation results show that compared to non-mixed expert DRL methods,the proposed DRL-MOE model improves driving efficiency by 15.04%,ensuring zero collisions and zero departures,demonstrating higher robustness and superior performance.

Key words: autonomous driving, high speed lane change decision-making, deep reinforcement , learning, mixture of expert