汽车工程 ›› 2025, Vol. 47 ›› Issue (8): 1513-1521.doi: 10.19562/j.chinasae.qcgc.2025.08.008

• • 上一篇    

基于主观先验强化学习的汽车环岛驾驶决策

吴坚,石裕康,朱冰,赵健,陈志成()   

  1. 吉林大学,汽车底盘集成与仿生全国重点实验室,长春 130022
  • 收稿日期:2024-07-18 修回日期:2024-09-18 出版日期:2025-08-25 发布日期:2025-08-18
  • 通讯作者: 陈志成 E-mail:chenzhicheng@jlu.edu.cn
  • 基金资助:
    国家自然科学基金(52302471);国家自然科学基金(52172386);吉林省自然科学基金面上项目(20240101121JC);吉林省长春市重大科技专项(20220301009GX)

Intelligent Vehicle Decision for Roundabouts Based on Subjective Prior Reinforcement Learning

Jian Wu,Yukang Shi,Bing Zhu,Jian Zhao,Zhicheng Chen()   

  1. Jilin University,State Key Laboratory of Automotive Chassis Integration and Bionics,Changchun 130022
  • Received:2024-07-18 Revised:2024-09-18 Online:2025-08-25 Published:2025-08-18
  • Contact: Zhicheng Chen E-mail:chenzhicheng@jlu.edu.cn

摘要:

针对汽车在复杂强交互环岛场景下面临的安全性问题,提出一种基于主观先验强化学习的驾驶决策策略。首先,构建包含汽车横纵向耦合动作空间、多尺度信息状态空间、多目标奖励函数的环岛场景模型。其次,采用人类偏好强化学习理论优化的Soft Actor-Critic算法,设计考虑智能体行为风险先验认知的汽车驾驶决策策略。基于多层感知机的自学习主观风险分类器,对智能体行为风险进行先验认知评定,引导汽车驾驶决策朝向更安全方向收敛。最后,搭建CARLA仿真环境开展测试验证。结果表明,相比于SAC算法,本文设计的策略能够帮助汽车在环岛场景中提升约8.73%的驾驶决策安全性能。

关键词: 智能汽车, 环岛场景, 驾驶决策, 主观先验, 强化学习

Abstract:

For the safety problems faced by intelligent vehicles in complex and highly interactive roundabout scenarios, a driving decision strategy based on Subjective Prior Deep Reinforcement Learning (SPDRL) is proposed. Firstly, a roundabout scenario model that includes the vehicle's longitudinal and lateral coupled action space, multi-scale information state space, and multi-objective reward function is constructed. Next, the Soft Actor-Critic (SAC) algorithm optimized with human preference reinforcement learning theory is used to design a driving decision strategy that considers the prior cognition of agent behavior risks. A self-learning subjective risk classifier, based on a multilayer perceptron, is applied to evaluate the prior cognition of agent behavioral risks and guide the driving decisions towards safer outcome. Finally, tests and verification are carried out using the CARLA simulation environment. The results show that the proposed strategy improves the safety performance of driving decisions by approximately 8.73% in roundabout scenarios compared to the standard SAC algorithm.

Key words: intelligent vehicles, roundabout scenarios, driving decision, subjective prior, reinforcement learning