汽车工程 ›› 2023, Vol. 45 ›› Issue (10): 1779-1790.doi: 10.19562/j.chinasae.qcgc.2023.10.001

所属专题: 智能网联汽车技术专题-感知&HMI&测评2023年

• •    下一篇

基于多模态特征融合的行人穿越意图预测方法

陈龙1,杨晨1,蔡英凤1(),王海2,李祎承2   

  1. 1.江苏大学汽车工程研究院,镇江  212013
    2.江苏大学汽车与交通工程学院,镇江  212013
  • 收稿日期:2023-02-13 修回日期:2023-03-14 出版日期:2023-10-25 发布日期:2023-10-23
  • 通讯作者: 蔡英凤 E-mail:caicaixiao0304@126.com
  • 基金资助:
    国家自然科学基金(52225212)

Pedestrian Crossing Intention Prediction Method Based on Multimodal Feature Fusion

Long Chen1,Chen Yang1,Yingfeng Cai1(),Hai Wang2,Yicheng Li2   

  1. 1.Institute of Automotive Engineering,Jiangsu University,Zhenjiang  212013
    2.School of Automotive and Traffic Engineering,Jiangsu University,Zhenjiang  212013
  • Received:2023-02-13 Revised:2023-03-14 Online:2023-10-25 Published:2023-10-23
  • Contact: Yingfeng Cai E-mail:caicaixiao0304@126.com

摘要:

行人行为预测是城市环境智能汽车决策规划系统面临的主要挑战之一,提升行人穿越意图的预测准确率对于行车安全意义重大。针对现有方法过度依赖行人的边界框位置信息,且很少考虑交通场景中环境信息及交通对象间的交互关系等问题,本文提出一种基于多模态特征融合的行人过街意图预测方法。首先结合多种注意力机制构建了一种新型全局场景上下文信息提取模块和局部场景时空特征提取模块来增强其提取车辆周边场景时空特征的能力,并依赖场景的语义解析结果来捕获行人与其周围环境之间的交互关系,解决了交通环境上下文信息与交通对象之间的交互信息应用不充分的问题。此外,本文设计了一种基于混合融合策略的多模态特征融合模块,根据不同信息源的复杂程度实现了对视觉特征和运动特征的联合推理,为行人穿越意图预测模块提供可靠信息。基于JAAD数据集的测试表明,所提出方法的预测Accuracy为0.84,较基线方法提升了10.5%,相比于现有的同类型模型,所提出方法的综合性能最佳,且具有更广泛的应用场景。

关键词: 自动驾驶汽车, 行人意图预测, 多模态特征融合, 注意力机制

Abstract:

Pedestrian behavior prediction is one of the main challenges faced by urban environment intelligent vehicle decision planning system. It is of great significance to improve the prediction accuracy of pedestrian crossing intention for driving safety. In view of the problems that the existing methods rely too much on the location information of pedestrian boundary box, and rarely consider the environmental information in traffic scenes and the interaction between traffic objects, a pedestrian crossing intention prediction method based on multi-modal feature fusion is proposed. In this paper, a new global scene context information extraction module and a local scene spatiotemporal feature extraction module are constructed by combining multiple attention mechanisms to enhance its ability to extract spatiotemporal features of the scene around the vehicle, and rely on the semantic analysis results of the scene to capture the interaction between pedestrians and their surroundings, which solves the problem of insufficient application of the interactive information between the context information of the traffic environment and the traffic objects. In addition, a multimodal feature fusion module based on hybrid fusion strategy is designed in this paper, which realizes the joint reasoning of visual features and motion features according to the complexity of different information sources, and provides reliable information for pedestrian crossing intention prediction module. The test based on JAAD dataset shows that the prediction accuracy of the proposed method is 0.84, which is 10.5 % higher than that of the baseline method. Compared with existing models of the same type, the proposed method has the best comprehensive performance and has a wider application scenario.

Key words: autonomous vehicles, pedestrian intention prediction, multimodal feature fusion, attention mechanism