汽车工程 ›› 2024, Vol. 46 ›› Issue (1): 1-8.doi: 10.19562/j.chinasae.qcgc.2024.01.001

• 专题:智能座舱与人机交互技术 •    下一篇

基于多尺度骨架图和局部视觉上下文融合的驾驶员行为识别方法

胡宏宇,黎烨宸,张争光,曲优,何磊(),高镇海   

  1. 吉林大学,汽车仿真与控制国家重点实验室,长春 130022
  • 收稿日期:2023-07-26 修回日期:2023-09-09 出版日期:2024-01-25 发布日期:2024-01-23
  • 通讯作者: 何磊 E-mail:jlu_helei@jlu.edu.cn
  • 基金资助:
    吉林省自然科学基金(20210101064JC);国家自然科学基金(52272417);新能源智能汽车关键技术研发及产业化项目(TC210H02S);大学生创新创业训练计划项目(X202310183158)

Driver Behavior Recognition Based on Multi-scale Skeleton Graph and Local Visual Context Method

Hongyu Hu,Yechen Li,Zhengguang Zhang,You Qu,Lei He(),Zhenhai Gao   

  1. Jilin University,State Key Laboratory of Automotive Simulation and Control,Changchun  130022
  • Received:2023-07-26 Revised:2023-09-09 Online:2024-01-25 Published:2024-01-23
  • Contact: Lei He E-mail:jlu_helei@jlu.edu.cn

摘要:

识别非驾驶行为是提高驾驶安全性的重要手段之一。目前基于骨架序列和图像的融合识别方法具有计算量大和特征融合困难的问题。针对上述问题,本文提出一种基于多尺度骨架图和局部视觉上下文融合的驾驶员行为识别模型(skeleton-image based behavior recognition network,SIBBR-Net)。SIBBR-Net通过基于多尺度图的图卷积网络和基于局部视觉及注意力机制的卷积神经网络,充分提取运动和外观特征,较好地平衡了模型表征能力和计算量间的关系。基于手部运动的特征双向引导学习策略、自适应特征融合模块和静态特征空间上的辅助损失,使运动和外观特征间互相引导更新并实现自适应融合。最终在 Drive&Act 数据集进行算法测试,SIBBR-Net在动态标签和静态标签条件下的平均正确率分别为 61.78%和 80.42%,每秒浮点运算次数为 25.92G,较最优方法降低了76.96%。

关键词: 驾驶员行为识别, 多尺度骨架图, 局部视觉上下文, 多模态数据自适应融合

Abstract:

Non-driving behavior identification is one of the important ways to improve the safety of driving. The current recognition method based on skeleton sequence and image fusion has the problems of large model calculation and the difficulty of feature fusion. To address the above problems, the skeleton-image based behavior recognition network (SIBBR-Net) is proposed in this paper, which is based on the multi-scale skeleton graph and the local visual context. SIBBR-Net fully extracts motion and appearance features through a graph convolution network based on multi-scale skeleton graphs and a convolutional neural network based on local vision and attention mechanisms, and better balances the relationship between model representation capabilities and model calculation. The feature bidirectional guided learning strategy based on hand motion, an adaptive feature fusion module and an auxiliary loss on the static feature space can guide mutual guidance and updating between motion and appearance features to achieve adaptive fusion. SIBBR-Net is finally tested on the Drive & Act dataset, and the average accuracy is 61.78% for dynamic labels and 80.42% for static labels. The Floating-point Operations per Second (FLOPS) of SIBBR-Net is 25.92G, which is 76.96% lower than that of the optimal method.

Key words: driver behavior recognition, multi-scale skeleton graph, local visual context, multi-model data adaptive fusion