汽车工程 ›› 2023, Vol. 45 ›› Issue (10): 1954-1964.doi: 10.19562/j.chinasae.qcgc.2023.10.016

所属专题: 新能源汽车技术-电驱动&能量管理2023年

• • 上一篇    下一篇

基于逆强化学习的混合动力汽车能量管理策略研究

齐春阳1,宋传学2,宋世欣3,靳立强2,王达3,肖峰1()   

  1. 1.吉林大学,汽车仿真与控制国家重点实验室,长春  130022
    2.吉林大学汽车工程学院,长春  130022
    3.吉林大学机械与航空航天工程学院,长春  130022
  • 收稿日期:2023-03-26 修回日期:2023-05-12 出版日期:2023-10-25 发布日期:2023-10-23
  • 通讯作者: 肖峰 E-mail:xiaofengjl@jlu.edu.cn
  • 基金资助:
    国家重点研发计划项目(2021YFB2500704)

Research on Energy Management Strategy for Hybrid Electric Vehicles Based on Inverse Reinforcement Learning

Chunyang Qi1,Chuanxue Song2,Shixin Song3,Liqiang Jin2,Da Wang3,Feng Xiao1()   

  1. 1.Jilin University,State Key Laboratory of Automotive Simulation and Control,Changchun  130022
    2.College of Automotive Engineering,Jilin University,Changchun  130022
    3.School of Mechanical and Aerospace Engineering,Jilin University,Changchun  130022
  • Received:2023-03-26 Revised:2023-05-12 Online:2023-10-25 Published:2023-10-23
  • Contact: Feng Xiao E-mail:xiaofengjl@jlu.edu.cn

摘要:

能量管理策略是混合动力汽车关键技术之一。随着计算能力与硬件设备的不断升级,越来越多的学者逐步开展了基于学习的能量管理策略的研究。在基于强化学习的混合动力汽车能量管理策略研究中,智能体与环境相互作用的导向是由奖励函数决定。然而,目前的奖励函数设计多数是主观决定或者根据经验得来的,很难客观地描述专家的意图,所以在该条件不能保证智能体在给定奖励函数下学习到最优驾驶策略。针对这些问题,本文提出了一种基于逆向强化学习的能量管理策略,通过逆向强化学习的方法获取专家轨迹下的奖励函数权值,并用于指导发动机智能体和电池智能体的行为。之后将修改后的权重重新输入正向强化学习训练。从油耗值、SOC变化曲线、奖励训练过程、动力源转矩等方面,验证该权重值的准确性以及在节油能力方面具有一定的优势。综上所述,该算法的节油效果提高了5%~10%。

关键词: 混合动力汽车, 最大熵逆向强化学习, 能量管理策略, 正向强化学习

Abstract:

Energy management strategy is one of the key technologies for hybrid vehicles. With the continuous upgrading of computing power and hardware devices, more and more scholars have gradually carried out research on learning-based energy management strategies. In the study of reinforcement learning-based energy management strategies for hybrid electric vehicles, the orientation of the interaction between the intelligent agent and the environment is determined by the reward function. However, most of the current reward function design is subjectively determined or based on experience, which is difficult to objectively describe the expert's intention, so in that condition there is no guarantee that the intelligent body will learn the optimal driving strategy for a given reward function. To address these problems, an energy management strategy based on inverse reinforcement learning is proposed in this paper to obtain the reward function weights under the expert trajectory by means of inverse reinforcement learning and use them to guide the behavior of the engine and battery intelligent agents. Then, the modified weights are input again into the positive reinforcement learning training. The fuel consumption value, SOC variation curve, reward training process and power source torque are used to verify the accuracy of the weight value and its advantage in terms of fuel saving capability. In summary, the algorithm has improved the fuel saving effect by 5%~10%.

Key words: hybrid electric vehicle, maximum entropy reverse reinforcement learning, energy management strategy, positive reinforcement learning