汽车工程 ›› 2025, Vol. 47 ›› Issue (8): 1479-1489.doi: 10.19562/j.chinasae.qcgc.2025.08.005

• • 上一篇    

基于MobileViT模型和光流融合的驾驶人行为识别

徐慧智(),张建召,蒋贤才,宋成举   

  1. 东北林业大学土木与交通学院,哈尔滨 150040
  • 收稿日期:2025-01-07 修回日期:2025-04-17 出版日期:2025-08-25 发布日期:2025-08-18
  • 通讯作者: 徐慧智 E-mail:stedu@126.com
  • 基金资助:
    黑龙江省自然科学基金—联合基金培育项目(PL2024E012);国家自然科学基金青年项目(51108137)

Driver Behavior Recognition Method via MobileViT Model and Optical Flow Fusion

Huizhi Xu(),Jianzhao Zhang,Xiancai Jiang,Chengju Song   

  1. School of Civil Engineering and Transportation,Northeast Forestry University,Harbin 150040
  • Received:2025-01-07 Revised:2025-04-17 Online:2025-08-25 Published:2025-08-18
  • Contact: Huizhi Xu E-mail:stedu@126.com

摘要:

本文基于MobileViT算法,提出一种新型CNN和Transformer相结合的驾驶人行为识别模型,即Mse-MViT模型。该模型借助光流算法对图像递归处理,提取视频片段起始帧至顶点帧的关键帧序列,获取驾驶人运动信息。自建Driver-vior数据集,基于多尺度特征融合、SE注意力机制和双分支结构,实现运动信息和图像全局与局部特征融合。实验结果表明:Mse-MViT模型识别驾驶人行为准确率达到了95.83%,具有更好的性能和鲁棒性;在State Farm数据集上进行对比实验,精度提升了2.5%,验证了改进算法的泛化能力与有效性。

关键词: 驾驶人行为识别, 光流算法, MobileViT, 多尺度特征融合

Abstract:

Based on the MobileViT algorithm, a novel driver behavior recognition model of Mse-MViT model is proposed in this paper, which integrates Convolutional Neural Networks (CNNs) with Transformers. The model uses the optical flow algorithm for recursive image processing, enabling the extraction of key frame sequences from the initial frame to the apex frame of a video clip to effectively capture driver motion information. A self-constructed Driver-vior dataset is introduced. Through multi-scale feature fusion, an SE attention mechanism, and dual-branch architecture, the model achieves comprehensive integration of motion cues with global and local image features. The experimental results show that the Mse-MViT model achieves a driver behavior recognition accuracy of 95.83%, exhibiting superior performance and robustness. Furthermore, comparative experiments conducted on the State Farm dataset show a 2.5% improvement in accuracy, validating the generalization capability and effectiveness of the proposed method.

Key words: driver behavior recognition, optical flow algorithm, MobileViT, multi-scale feature fusion