Administrator by China Associction for Science and Technology
Sponsored by China Society of Automotive Engineers
Published by AUTO FAN Magazine Co. Ltd.

Automotive Engineering ›› 2025, Vol. 47 ›› Issue (5): 829-838.doi: 10.19562/j.chinasae.qcgc.2025.05.004

Previous Articles     Next Articles

Multi-object Detection Algorithm Based on Camera and Radar Fusion for Autonomous Driving Scenarios

Chenyu Liu1,Hai Wang1(),Yingfeng Cai2,Long Chen2   

  1. 1.School of Automotive and Traffic Engineering,Jiangsu University,Zhenjiang 212013
    2.Institute of Automotive Engineering,Jiangsu University,Zhenjiang 212013
  • Received:2024-11-07 Revised:2025-01-07 Online:2025-05-25 Published:2025-05-20
  • Contact: Hai Wang E-mail:wanghai1019@163.com

Abstract:

To meet the demand of efficient and accurate perception in autonomous driving systems, relying solely on cameras makes it challenging to achieve high-precision and robust 3D object detection. An effective solution to address this issue is to combine cameras with cost-effective millimeter-wave radar sensors, enabling more reliable multimodal 3D object detection. An effective approach to address this problem is to combine cameras with cost-effective millimeter-wave radar sensors, enabling more reliable multimodal 3D object detection, which not only improves the accuracy of environmental perception but also enhances the system's robustness and safety. In this paper, an autonomous driving perception algorithm based on the fusion of millimeter-wave radar and cameras, named HPR-Det (historical pillar of ray camera-radar fusion bird’s eye view for 3D object detection) is proposed. Specifically, a radar BEV (bird's eye view) feature extraction module called Radar-PRANet (radar point RCS attention net) is designed firstly. It comprises a dual-stream radar backbone that extracts radar features with two representations, and an RCS-aware BEV encoder that distributes radar features into the BEV space based on radar-specific RCS characteristics. Secondly, Historical radar of Object Prediction paradigm is adopted, designing both long-term and short-term decoders that operate only during training, thus avoiding additional inference overhead. Due to the sparsity of the input data in this network, multimodal historical multi-frame input is introduced to facilitate more accurate BEV feature learning. Lastly, the millimeter-wave-optimized ray denoising method is proposed, which utilizes the information from the current frame’s millimeter-wave radar point cloud as prior knowledge to assist in proposal generation, thereby enhancing the query feature representation for the camera. The proposed algorithm is trained and validated on the large-scale public dataset nuScenes, with the NDS reaching 56.7% on the backbone of Resnet50.

Key words: autonomous driving, deep learning, object detection, multi-sensor fusion