汽车工程 ›› 2025, Vol. 47 ›› Issue (12): 2336-2345.doi: 10.19562/j.chinasae.qcgc.2025.12.006

• • 上一篇    下一篇

融合PER和TL的燃料电池客车能量管理策略更新方法研究

黄汝臣,何洪文()   

  1. 北京理工大学,高端汽车集成与控制全国重点实验室,北京 100081
  • 收稿日期:2025-04-11 修回日期:2025-05-15 出版日期:2025-12-25 发布日期:2025-12-19
  • 通讯作者: 何洪文 E-mail:hwhebit@bit.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(52172377)、中国科协青年人才托举工程博士生专项计划项目(中国汽车工程学会)、北京理工大学研究生科研水平和创新能力提升计划重点项目(2024YCXZ019)资助。

Research on Updating Method of Energy Management Strategy for Fuel Cell Bus with Integrated PER and TL

Ruchen Huang,Hongwen He()   

  1. Beijing Institute of Technology,National Key Laboratory of Advanced Vehicle Integration and Control,Beijing 100081
  • Received:2025-04-11 Revised:2025-05-15 Online:2025-12-25 Published:2025-12-19
  • Contact: Hongwen He E-mail:hwhebit@bit.edu.cn

摘要:

针对深度强化学习型能量管理策略(EMS)训练效率低、更新不及时的问题,本文以燃料电池客车为研究对象,提出了一种融合优先经验回放(PER)和迁移学习(TL)的智能EMS更新方法,设计了一种采样机理增强型柔性行动者-评论家(ESAC)算法,通过在SAC架构中集成PER机制,提升EMS的训练效率;提出了一种基于TL的EMS更新方法,通过研究知识共享机制,对基于ESAC的EMS进行跨工况知识迁移和策略复用,提高EMS的策略更新效率和长期优化性能。将更新后的EMS部署至能量管理控制器,以在线优化功率分配。仿真实验结果表明,相较于SAC,所提ESAC算法将训练效率提升了58.33%;相较于基准方法,所提更新方法将EMS的更新效率和燃油经济性分别提高了63.01%和5.24%,同时展现出了实时应用潜力。

关键词: 迁移学习, 优先经验回放, 柔性行动者-评论家, 能量管理策略更新, 燃料电池客车

Abstract:

For the problems of low training efficiency and delayed updating in deep reinforcement learning-based energy management strategies (EMSs), taking the fuel cell bas as the research object, an intelligent EMS updating method integrating prioritized experience replay (PER) and transfer learning (TL) for fuel cell buses is proposed in this paper. A sampling mechanism-enhanced soft actor-critic (ESAC) algorithm is designed to improve EMS training efficiency by incorporating PER into the SAC framework. Furthermore, a TL-based EMS updating method is proposed to enhance the updating efficiency and long-term optimization performance by leveraging the knowledge-sharing mechanism for cross-cycle knowledge transfer and policy reuse of the ESAC-based EMS. Finally, the updated EMS is deployed to the energy management controller for online power distribution optimization. The experimental simulation results show that, compared with SAC, the proposed ESAC algorithm improves training efficiency by 58.33%. Additionally, the proposed updating method enhances EMS updating efficiency by 63.01% and fuel economy by 5.24% over baseline methods, while demonstrating real-time application potential.

Key words: transfer learning, prioritized experience replay, soft actor-critic, energy management strategy updating, fuel cell bus