基于深度卷积-Tokens降维优化视觉Transformer的分心驾驶行为实时检测

doi:10.19562/j.chinasae.qcgc.2023.06.008

摘要/Abstract

摘要：

针对基于端到端深度卷积神经网络的驾驶行为检测模型缺乏全局特征提取能力以及视觉Transformer（vision transformer，ViT）模型不擅长捕捉底层特征和模型参数量较大的问题，本文提出一种基于深度卷积和Tokens降维的ViT模型用于驾驶人分心驾驶行为实时检测，并通过开展与其他模型的对比试验、所提模型的消融试验和模型注意力区域的可视化试验充分验证了所提模型的优越性。本文所提模型的平均分类准确率和精确率分别为96.93%和96.95%，模型参数量为21.22 M，基于真实车辆平台在线推理速度为23.32 fps，表明所提模型能够实现实时分心驾驶行为检测。研究结果有利于人机共驾系统的控制策略制定和分心预警。

关键词: 汽车工程, 分心驾驶行为检测模型, 视觉Transformer, 多头注意力机制, 卷积神经网络, Tokens降维

Abstract:

To address the problems that the end-to-end Deep Convolutional Neural Network （DCNN） based driving behavior detection model lacks global feature extraction ability， and the Vision Transformer （ViT） model is not good at capturing underlying features with a large number of model parameters， this paper proposes a ViT model that combines deep convolution and Tokens downscaled optimization for real-time detection of driver distraction behavior. Comparison experiments with other models， ablation experiments and visualization experiments of the models’ attention region are carried out to fully validate the superiority of the proposed model. The mean accuracy and precision of the proposed model are 96.93% and 96.95%， respectively. The number of the model parameters is 21.22 M； and the online inference speed based on the real vehicle platform is 23.32 fps， indicating that the proposed model can achieve real-time distracted behavior detection. The result of the study is beneficial to the control strategy development and distraction warning of human-machine co-driving system.

Key words: automotive engineering, distracted behavior detection model, vision Transformer, multi-headed attention mechanism, convolutional neural network, Tokens dimensionality reduction

赵霞,李朝,付锐,葛振振,王畅. 基于深度卷积-Tokens降维优化视觉Transformer的分心驾驶行为实时检测[J]. 汽车工程, 2023, 45(6): 974-988.

Xia Zhao,Zhao Li,Rui Fu,Zhenzhen Ge,Chang Wang. Real-Time Detection of Distracted Driving Behavior Based on Deep Convolution-Tokens Dimensionality Reduction Optimized Visual Transformer Model[J]. Automotive Engineering, 2023, 45(6): 974-988.

图/表 20

图1

图2

图3

图 4

图 5

图6

图7

表1

图8

图9

图10

表2

表3

表4

表5

表6

表7

图11

图12

图13

参考文献 39

1	胡云峰，曲婷，刘俊，等. 智能汽车人机协同控制的研究现状与展望［J］. 自动化学报， 2019， 45（7）： 1261-1280.
	HU Yunfeng， QU Ting， LIU Jun， et al. Human-machine cooperative control of intelligent vehicle： recent developments and future perspectives［J］. Acta Automatica Sinica， 2019， 45（7）： 1261-1280.
2	LI M J， CAO H T， SONG X L， et al. Shared control driver assistance system based on driving intention and situation assessment［J］. IEEE Transactions on Industrial Informatics， 2018， 14（11）： 4982-4994.
3	UCAR S， MURALIDHARAN H， SISBOT E A， et al. Distracted driving detection［C］. 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events （PerCom Workshops）. IEEE， 2022： 70-72.
4	康小刚. 基于脑电信号的驾驶疲劳状态检测及缓解方法研究［D］. 吉林：东北电力大学， 2022.
	KANG Xiaogang. Driver distraction characteristics and intervention method research［D］. Jilin： Northeast Electric Power University， 2022.
5	CHAI R， NAIK G R， NGUYEN T N， et al. Driver fatigue classification with independent component by entropy rate bound minimization analysis in an EEG-based system［J］. IEEE Journal of Biomedical and Health Informatics， 2017， 21（3）： 715-724.
6	NALLAPERUMA D， DE SILVA D， ALAHAKOON D， et al. Intelligent detection of driver behavior changes for effective coordination between autonomous and human driven vehicles［C］. IECON 2018 - 44TH Annual Conference of the IEEE Industrial Electronics Society， 2018： 3120-3125.
7	LI Z J， BAO S， KOLMANOVSKY I V， et al. Visual-manual distraction detection using driving performance indicators with naturalistic driving data［J］. IEEE Transactions on Intelligent Transportation Systems， 2018， 19（8）： 2528-2535.
8	VICENTE F， HUANG Z H， XIONG X H， et al. Driver gaze tracking and eyes off the road detection system［J］. IEEE Transactions on Intelligent Transportation Systems， 2015， 16（4）： 2014-2027.
9	HUANG T， FU R， CHEN Y， et al. Real-time driver behavior detection based on deep deformable inverted residual network with an attention mechanism for human-vehicle co-driving system［J］. IEEE Transactions on Vehicular Technology， 2022： 1-14.
10	尹智帅，钟恕，聂琳真，等. 基于人体姿态估计的分心驾驶行为检测［J］. 中国公路学报， 2022， 35（6）： 312-323.
	YIN Zhishuai， ZHONG Shu， NIE Linzhen， et al. Distracted driving behavior detection based on human pose estimation［J］. China Journal of Highway and Transport， 2022， 35（6）： 312-323.
11	LI L， ZHONG B， HUTMACHER C， et al. Detection of driver manual distraction via image-based hand and ear recognition［J］. Accident Analysis & Prevention， 2020， 137： 105432.
12	YUEN K， MARTIN S， TRIVEDI M M. Looking at faces in a vehicle： a deep CNN based approach and evaluation［C］. 2016 IEEE 19th International Conference on Intelligent Transportation Systems （ITSC）. IEEE， 2016： 649-654.
13	LE T H N， ZHU C， ZHENG Y， et al. DeepSafeDrive： a grammar-aware driver parsing approach to driver behavioral situational awareness （DB-SAW）［J］. Pattern Recognition， 2017， 66： 229-238.
14	LI W， HUANG J， XIE G， et al. A survey on vision-based driver distraction analysis［J］. Journal of Systems Architecture， 2021， 121： 102319.
15	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. Imagenet classification with deep convolutional neural networks［J］. Communications of the ACM， 2017， 60（6）： 84-90.
16	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. arXiv e-prints， 2014： arXiv：.
17	SZEGEDY C， WEI L， JIA Y Q， et al. Going deeper with convolutions［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition， 2015： 1-9.
18	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition， 2016： 770-778.
19	CHOLLET F. Xception： deep learning with depthwise separable convolutions［J］. arXiv e-prints， 2016： arXiv：.
20	XING Y， TANG J， LIU H， et al. End-to-end driving activities and secondary tasks eecognition using deep convolutional neural network and transfer learning［C］. 2018 IEEE Intelligent Vehicles Symposium （IV）. IEEE， 2018： 1626-1631.
21	HSSAYENI M， SAXENA S， PTUCHA R， et al. Distracted driver detection： deep learning vs handcrafted features［J］. Electronic Imaging， 2017， 2017： 20-26.
22	ZHANG R， KE X. Study on distracted driving behavior based on transfer learning［C］. 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference （ITAIC）， 2022： 1315-1319.
23	TRAN D， MANH DO H， SHENG W， et al. Real-time detection of distracted driving based on deep learning［J］. IET Intelligent Transport Systems， 2018， 12（10）： 1210-1219.
24	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： transformers for image recognition at scale［J］. arXiv preprint arXiv：， 2020.
25	HAN K， WANG Y， CHEN H， et al. A survey on vision transformer［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022.
26	DEVLIN J， CHANG M W， LEE K， et al. Pre-training of deep bidirectional transformers for language understanding［J］. arXiv preprint arXiv：， 2018.
27	WU B， XU C， DAI X， et al. Visual transformers： Token-based image representation and processing for computer vision［J］. arXiv e-prints， 2020： arXiv：.
28	HAN K， XIAO A， WU E H， et al. Transformer in transformer［J］. Advances in Neural Information Processing Systems， 2021， 34： 15908-15919.
29	LI Y， WANG L F， MI W， et al. Distracted driving detection by combining ViT and CNN［C］. 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design （CSCWD）. IEEE， 2022： 908-913.
30	CHEN Y， DAI X， CHEN D， et al. Mobile-former： bridging mobilenet and transformer［C］. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition， 2022： 5270-5279.
31	MEHTA S， RASTEGARI M. Mobilevit： light-weight， general-purpose， and mobile-friendly vision transformer［J］. arXiv preprint arXiv：， 2021.
32	WANG W， XIE E， LI X， et al. Pyramid vision transformer： a versatile backbone for dense prediction without convolutions［C］. Proceedings of the IEEE/CVF International Conference on Computer Vision， 2021： 568-578.
33	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［J］. Advances in Neural Information Processing Systems， 2017， 30.
34	SHAHVERDY M， FATHY M， BERANGI R， et al. Driver behavior detection and classification using deep convolutional neural networks［J］. Expert Systems with Applications， 2020， 149： 113240.
35	ERAQI H M， ABOUELNAGA Y， SAAD M H， et al. Driver distraction identification with an ensemble of convolutional neural networks［J］. Journal of Advanced Transportation， 2019， 2019： 4125865.
36	LI X， YU L， CHANG D， et al. Dual cross-entropy loss for small-sample fine-grained vehicle classification［J］. IEEE Transactions on Vehicular Technology， 2019， 68（5）： 4204-4212.
37	PHAN T H， YAMAMOTO K. Resolving class imbalance in object detection with weighted cross entropy losses［J］. arXiv preprint arXiv：， 2020.
38	MARKOULIDAKIS I， RALLIS I， GEORGOULAS I， et al. Multiclass confusion matrix reduction method and its application on net promoter score classification problem［J］. Technologies， 2021， 9（4）： 81.
39	ABDUALGALIL B， ABRAHAM S. Applications of machine learning algorithms and performance comparison： a review［C］. 2020 International Conference on Emerging Trends in Information Technology and Engineering （ic-ETITE）. IEEE， 2020： 1-6.

模型	Image size	Patch size	Layer	Head	Hidden size	FTN	BTN	MLP size	L1	L2
Co-Td-ViT	224	16	7	8	768	197	96	612	4	3
Td-ViT	224	16	7	8	768	197	96	612	4	3
Co-ViT	224	16	7	8	768	197		612
ViT-1	224	16	8	8	768	197		612
ViT-2	224	16	12	12	768	197		1024

模型	评价指标				FPS	参数量/M
模型	mAcc/%	mP/%	mR/%	mF₁/%	FPS	参数量/M
DenseNet	96.31	96.34	96.34	96.32	146.33	18.11
ResNet-101	96.45	96.48	96.50	96.47	183.00	42.52
EfficientNet	93.82	93.86	93.84	93.83	154.21	53.52
Inception-v4	92.81	92.83	92.79	92.78	160.82	41.16
Swin	95.02	95.04	95.03	95.03	127.95	27.53
Co-Td-ViT	96.93	96.95	96.95	96.94	279.56	21.22

驾驶员行为	Co-Td-ViT	DenseNet	ResNet-101	EfficientNet	Inception-v4	Swin
双手驾驶	98.37	97.56	97.98	94.86	93.77	97.23
看手机	98.19	98.18	97.83	94.27	94.62	94.68
手机导航	96.96	96.93	97.32	91.76	90.6	94.3
操作中控系统	97.32	96.46	96.48	94.20	91.23	94.62
喝水	97.45	96.73	97.09	94.89	91.23	93.86
打电话	93.62	92.31	92.61	90.46	89.82	92.70
回头聊天	98.88	98.50	98.50	95.90	97.00	97.05
单手驾驶	94.86	94.07	94.07	94.61	94.40	95.93
平均	96.95	96.34	96.48	93.86	92.83	95.05

驾驶员行为	Co-Td-ViT	DenseNet	ResNet-101	EfficientNet	Inception-v4	Swin
双手驾驶	94.53	93.75	94.53	93.75	94.14	96.09
看手机	95.77	95.07	95.07	92.61	92.96	94.01
手机导航	97.70	96.93	97.32	93.87	92.34	95.02
操作中控系统	98.64	98.64	99.10	95.48	94.12	95.48
喝水	97.45	96.73	97.09	94.55	94.55	94.55
打电话	98.51	98.51	98.13	95.52	92.16	94.78
回头聊天	97.79	96.69	96.32	94.49	95.22	96.69
单手驾驶	95.24	94.44	94.44	90.48	86.90	93.65
平均	96.95	96.35	96.50	93.84	92.79	95.03

模型	评价指标				FPS	参数量/M
模型	mAcc/%	mP/%	mR/%	mF₁/%	FPS	参数量/M
ViT-1	92.77	92.81	92.82	92.79	291.89	20.90
ViT-2	93.39	93.43	93.43	93.41	198.25	48.0
Co-ViT	94.93	94.95	94.94	94.93	317.26	23.14
Td-ViT	94.40	94.42	94.45	94.42	314.38	18.45
Co-Td-ViT	96.93	96.95	96.95	96.94	279.56	21.22