汽车工程 ›› 2023, Vol. 45 ›› Issue (6): 974-988.doi: 10.19562/j.chinasae.qcgc.2023.06.008

所属专题: 智能网联汽车技术专题-感知&HMI&测评2023年

• • 上一篇    下一篇

基于深度卷积-Tokens降维优化视觉Transformer的分心驾驶行为实时检测

赵霞,李朝,付锐,葛振振(),王畅   

  1. 长安大学汽车学院,西安 710064
  • 收稿日期:2022-11-18 修回日期:2023-01-17 出版日期:2023-06-25 发布日期:2023-06-16
  • 通讯作者: 葛振振 E-mail:gezhenzhen@chd.edu.cn
  • 基金资助:
    国家重点研发计划项目(2019YFB1600500)

Real-Time Detection of Distracted Driving Behavior Based on Deep Convolution-Tokens Dimensionality Reduction Optimized Visual Transformer Model

Xia Zhao,Zhao Li,Rui Fu,Zhenzhen Ge(),Chang Wang   

  1. School of Automobile of Chang’an University,Xi’an 710064
  • Received:2022-11-18 Revised:2023-01-17 Online:2023-06-25 Published:2023-06-16
  • Contact: Zhenzhen Ge E-mail:gezhenzhen@chd.edu.cn

摘要:

针对基于端到端深度卷积神经网络的驾驶行为检测模型缺乏全局特征提取能力以及视觉Transformer(vision transformer,ViT)模型不擅长捕捉底层特征和模型参数量较大的问题,本文提出一种基于深度卷积和Tokens降维的ViT模型用于驾驶人分心驾驶行为实时检测,并通过开展与其他模型的对比试验、所提模型的消融试验和模型注意力区域的可视化试验充分验证了所提模型的优越性。本文所提模型的平均分类准确率和精确率分别为96.93%和96.95%,模型参数量为21.22 M,基于真实车辆平台在线推理速度为23.32 fps,表明所提模型能够实现实时分心驾驶行为检测。研究结果有利于人机共驾系统的控制策略制定和分心预警。

关键词: 汽车工程, 分心驾驶行为检测模型, 视觉Transformer, 多头注意力机制, 卷积神经网络, Tokens降维

Abstract:

To address the problems that the end-to-end Deep Convolutional Neural Network (DCNN) based driving behavior detection model lacks global feature extraction ability, and the Vision Transformer (ViT) model is not good at capturing underlying features with a large number of model parameters, this paper proposes a ViT model that combines deep convolution and Tokens downscaled optimization for real-time detection of driver distraction behavior. Comparison experiments with other models, ablation experiments and visualization experiments of the models’ attention region are carried out to fully validate the superiority of the proposed model. The mean accuracy and precision of the proposed model are 96.93% and 96.95%, respectively. The number of the model parameters is 21.22 M; and the online inference speed based on the real vehicle platform is 23.32 fps, indicating that the proposed model can achieve real-time distracted behavior detection. The result of the study is beneficial to the control strategy development and distraction warning of human-machine co-driving system.

Key words: automotive engineering, distracted behavior detection model, vision Transformer, multi-headed attention mechanism, convolutional neural network, Tokens dimensionality reduction