汽车工程 ›› 2023, Vol. 45 ›› Issue (8): 1373-1382.doi: 10.19562/j.chinasae.qcgc.2023.08.008

所属专题: 智能网联汽车技术专题-规划&决策2023年

• • 上一篇    下一篇

基于效用理论的运动规划奖励函数设计方法

冉巍1,陈慧1(),杨佳鑫1,西村要介2,国朝鵬2,尹又雨3   

  1. 1.同济大学汽车学院,上海 201804
    2.株式会社捷太格特,日本 6348555
    3.捷太格特科技研发中心(无锡)有限公司,无锡 214161
  • 收稿日期:2023-04-11 出版日期:2023-08-25 发布日期:2023-08-17
  • 通讯作者: 陈慧 E-mail:hui-chen@tongji. edu. cn

Design Method of Motion Planning Reward Function Based on Utility Theory

Wei Ran1,Hui Chen1(),Jiaxin Yang1,Nishimura Yosuke2,Chaopeng Guo2,Youyu Yin3   

  1. 1.School of Automotive Studies,Tongji University,Shanghai 201804
    2.JTEKT CORPORATION,Japan 6348555
    3.JTEKT Research and Development Center(WUXI)Co. ,Ltd. ,Wuxi 214161
  • Received:2023-04-11 Online:2023-08-25 Published:2023-08-17
  • Contact: Hui Chen E-mail:hui-chen@tongji. edu. cn

摘要:

实现个性化且符合驾驶员偏好的运动规划对提高驾驶员对自动驾驶系统接受度具有重要意义。本文提出了一种考虑驾驶员偏好的运动规划奖励函数设计方法。首先,基于效用理论提出了一个量化驾驶员轨迹偏好的双层结构模型,上层效用评估模型量化驾驶员在安全、舒适性和效率之间的权衡过程;下层的驾驶员感知模型量化了驾驶员对安全、舒适性和效率方面的主观感受与轨迹特征指标之间的关系。接着,分别基于评分和配对比较两种评价方法提出了轨迹偏好模型的估计方法。最后,通过驾驶员模拟器评价试验对模型估计方法进行验证,每个试验者分别采用评分和配对比较的方式对多条轨迹进行了主观评价。基于获取的两种评价结果及计算得到的轨迹特征,分别用两种方法对驾驶员轨迹偏好模型进行了估计。结果表明,提出的模型能够较为准确地描述驾驶员的偏好评价过程,而基于配对比较的模型估计结果则更准确。

关键词: 效用理论, 运动规划, 奖励函数, 驾驶员偏好, 个性化

Abstract:

Personalized and driver-preferred motion planning is of great importance in enhancing the acceptance of autonomous driving systems by drivers. This paper proposes a method for designing a motion planning reward function that considers driver preferences. Firstly, a two-layer structure model for quantifying driver trajectory preferences is proposed based on utility theory. The upper-layer utility evaluation model quantifies the driver's trade-off process between safety, comfort, and efficiency, while the lower-layer driver perception model quantifies the relationship between the driver's subjective feelings about safety, comfort, and efficiency and trajectory feature indicators. Then, two estimation methods for the trajectory preference model are proposed based on rating and pairwise comparison methods, respectively. Finally, the model estimation method is verified through a driver simulator evaluation test. Each participant in the experiment subjectively evaluates multiple trajectories using both rating and pairwise comparison approaches. Based on the evaluation results from the two evaluation methods and the computed trajectory features, the driver trajectory preference model is estimated using the two approaches. The results show that the proposed model can accurately describe the driver's preference evaluation process, with the estimation results based on comparison more accurate.

Key words: utility theory, motion planning, reward function, driver preference, personalization