Administrator by China Associction for Science and Technology
Sponsored by China Society of Automotive Engineers
Published by AUTO FAN Magazine Co. Ltd.

Automotive Engineering ›› 2024, Vol. 46 ›› Issue (12): 2290-2302.doi: 10.19562/j.chinasae.qcgc.2024.12.015

Previous Articles     Next Articles

SFW-YOLOv8 Complex Scene Video Vehicle Detection Model

Qin Zhu1,2,Shenyang Han2,Mingru Zeng2(),Pinghong Lai3,Chuimao Wu2,Weiyi Hu2   

  1. 1.School of Public Policy and Management,Nanchang University,Nanchang 330036
    2.School of Information Engineering,Nanchang University,Nanchang 330036
    3.Jiangxi Provincial People's Hospital,Nanchang 330038
  • Received:2024-04-26 Revised:2024-06-12 Online:2024-12-25 Published:2024-12-20
  • Contact: Mingru Zeng E-mail:zeng_mr@163.com

Abstract:

For the problem that it is difficult for video vehicle detection models to extract rich target features in complex traffic monitoring scenarios, in this paper a new spatial-temporal feature fusion module SF-Module is established from the perspective of making full use of spatial-temporal feature information of video images. The multi-head self-attention mechanism in Transformer model is used to extract and fuse the temporal and spatial feature information of current and historical frames of video vehicle images to enrich the feature information of the target. On this basis, based on YOLOv8 network, the newly created spatio-temporal feature fusion module SF-Module is integrated in its neck network to mine spatio-temporal feature information of video image sequences. At the same time, the WIoU loss function is introduced as the prediction frame regression loss to reduce the harmful gradient generated by the low quality label frame, and the SFW-YOLOv8 video vehicle detection model is designed. Finally, the newly established SFW-YOLOv8 complex scene video vehicle detection model is tested on the UA-DETRAC dataset, and some images in the dataset are simulated to enhance the data on rainy and foggy days, so as to improve the generalization of the vehicle detection model. The experimental results show that the values of mAP50 and mAP50:5:95 of the SFW-YOLOv8 video vehicle detection model are 79.1% and 63.6%, which are 1.7% and 3.3% higher than that of the YOLOv8 model, respectively. The reasoning speed is 11 ms/ frame, which has excellent detection performance.

Key words: vehicle target detection, spatio-temporal feature fusion, Transformer, YOLOv8, attention mechanism