汽车工程 ›› 2025, Vol. 47 ›› Issue (5): 829-838.doi: 10.19562/j.chinasae.qcgc.2025.05.004

• • 上一篇    下一篇

面向自动驾驶道路场景的相机与毫米波融合的多目标检测算法

刘宸宇1,王海1(),蔡英凤2,陈龙2   

  1. 1.江苏大学汽车与交通工程学院,镇江 212013
    2.江苏大学汽车工程研究院,镇江 212013
  • 收稿日期:2024-11-07 修回日期:2025-01-07 出版日期:2025-05-25 发布日期:2025-05-20
  • 通讯作者: 王海 E-mail:wanghai1019@163.com
  • 基金资助:
    国家重点研发计划项目(2023YFB2504401);扬州市产业前瞻与关键核心技术(YZ2024033)

Multi-object Detection Algorithm Based on Camera and Radar Fusion for Autonomous Driving Scenarios

Chenyu Liu1,Hai Wang1(),Yingfeng Cai2,Long Chen2   

  1. 1.School of Automotive and Traffic Engineering,Jiangsu University,Zhenjiang 212013
    2.Institute of Automotive Engineering,Jiangsu University,Zhenjiang 212013
  • Received:2024-11-07 Revised:2025-01-07 Online:2025-05-25 Published:2025-05-20
  • Contact: Hai Wang E-mail:wanghai1019@163.com

摘要:

为满足自动驾驶系统的高效、准确感知的需求,如果仅依靠相机很难实现高精度和鲁棒的3D目标检测。解决这一问题的有效方法是将相机与经济型毫米波雷达传感器相结合,实现更可靠的多模态三维目标检测。融合两者的检测方式不仅提升了环境感知的准确性,还增强了系统的鲁棒性和安全性。本文提出了一种基于毫米波雷达和相机融合的自动驾驶感知算法HPR-Det(historical pillar of ray camera-radar fusion bird ’s eye view for 3D object detection)。具体而言,首先设计了雷达BEV特征提取Radar-PRANet (radar point RCS attention net),由双流雷达主干提取具有两种表征维度的雷达特征和RCS感知的BEV编码器组成,根据雷达特定的RCS特征将雷达特征分散到BEV中。其次,采用历史多帧预测范式HrOP(historical radar of object prediction),设计了长期解码器和短期解码器,同时只在训练期间执行,在推理过程中不引入额外的开销,同时由于本网络输入数据的稀疏性,引入了多模态的历史多帧输入,引导更准确的BEV特征学习。最后,提出了毫米波优化的射线去噪方法,通过将毫米波雷达点云的信息作为先验信息,使用当前帧的毫米波点云特征辅助生成提议,增强对于相机的查询特征表征。本文所提出的算法在大规模公开数据集nuScenes上进行模型训练和实验验证,在骨干为Resnet50的基础上NDS 达到56.7 %。

关键词: 自动驾驶, 深度学习, 目标检测, 多传感器融合

Abstract:

To meet the demand of efficient and accurate perception in autonomous driving systems, relying solely on cameras makes it challenging to achieve high-precision and robust 3D object detection. An effective solution to address this issue is to combine cameras with cost-effective millimeter-wave radar sensors, enabling more reliable multimodal 3D object detection. An effective approach to address this problem is to combine cameras with cost-effective millimeter-wave radar sensors, enabling more reliable multimodal 3D object detection, which not only improves the accuracy of environmental perception but also enhances the system's robustness and safety. In this paper, an autonomous driving perception algorithm based on the fusion of millimeter-wave radar and cameras, named HPR-Det (historical pillar of ray camera-radar fusion bird’s eye view for 3D object detection) is proposed. Specifically, a radar BEV (bird's eye view) feature extraction module called Radar-PRANet (radar point RCS attention net) is designed firstly. It comprises a dual-stream radar backbone that extracts radar features with two representations, and an RCS-aware BEV encoder that distributes radar features into the BEV space based on radar-specific RCS characteristics. Secondly, Historical radar of Object Prediction paradigm is adopted, designing both long-term and short-term decoders that operate only during training, thus avoiding additional inference overhead. Due to the sparsity of the input data in this network, multimodal historical multi-frame input is introduced to facilitate more accurate BEV feature learning. Lastly, the millimeter-wave-optimized ray denoising method is proposed, which utilizes the information from the current frame’s millimeter-wave radar point cloud as prior knowledge to assist in proposal generation, thereby enhancing the query feature representation for the camera. The proposed algorithm is trained and validated on the large-scale public dataset nuScenes, with the NDS reaching 56.7% on the backbone of Resnet50.

Key words: autonomous driving, deep learning, object detection, multi-sensor fusion