汽车工程 ›› 2025, Vol. 47 ›› Issue (6): 1122-1132.doi: 10.19562/j.chinasae.qcgc.2025.06.011

• • 上一篇    

MSF-Diffuser:BEV下基于扩散模型的多传感器自适应融合自动驾驶方法

王明辰1,王海1(),蔡英凤2,陈龙2,李祎承2   

  1. 1.江苏大学汽车与交通工程学院,镇江 212013
    2.江苏大学汽车工程研究院,镇江 212013
  • 收稿日期:2024-10-29 修回日期:2025-02-03 出版日期:2025-06-25 发布日期:2025-06-20
  • 通讯作者: 王海 E-mail:wanghai1019@163.com
  • 基金资助:
    第二十七届中国科协年会学术论文。国家自然科学基金(52225212)

MSF-Diffuser: A Multi-sensor Adaptive Fusion Autonomous Driving Method Based on Diffusion Model Under BEV

Mingchen Wang1,Hai Wang1(),Yingfeng Cai2,Long Chen2,Yicheng Li2   

  1. 1.School of Automotive and Traffic Engineering,Jiangsu University,Zhenjiang 212013
    2.Institute of Automotive Engineering,Jiangsu University,Zhenjiang 212013
  • Received:2024-10-29 Revised:2025-02-03 Online:2025-06-25 Published:2025-06-20
  • Contact: Hai Wang E-mail:wanghai1019@163.com

摘要:

自动驾驶算法是当前智能汽车的主要研究内容。目前,为了实现全景自动驾驶,国内大多采用多传感器融合的方式。然而现有的方案都存在对传感器利用率低、融合策略不合理等问题。针对这些问题,本文提出了一种BEV下基于多传感器(视觉+激光雷达+毫米波雷达)融合的自动驾驶框架。在该框架中,采用基于点和速度双重编码并进行特征交互来提取毫米波雷达点云特征,提高了毫米波雷达信息的利用率,并更加便于进行后续的融合。在融合模块,本文使用LSTM存储多模态传感器的特征以及融合后的BEV特征,从而计算不同模态传感器特征之间的一致性损失和融合BEV特征与历史帧的连续性损失,使特征融合更为平滑、精准。最后,引入扩散模型,并提出Multi-modal U-Net进行降噪,提高了模型规划轨迹的鲁棒性。本文使用CARLA模拟器,在最具权威的Longest-06基准和Town-05 Long基准上进行了广泛的实验,分别取得了73.80±1.01和73.7±1.3的DS(驾驶得分),与现有的自动驾驶方法相比,本文实现了更好的全景自动驾驶,且拥有更好的性能和灵活性。

关键词: 自动驾驶, 多传感器融合, 特征交互, 扩散模型

Abstract:

Autonomous driving algorithms are a major research focus in the field of intelligent vehicles. Currently, to achieve panoramic autonomous driving, most domestic approaches use multi-sensor fusion. However, existing solutions face problems such as low sensor utilization and unreasonable fusion strategies. For these problems, in this paper, an autonomous driving framework based on multi-sensor fusion (camera+LiDAR+Radar) under a bird's-eye view (BEV) is proposed. In this framework, dual encoding based on point and velocity is used, coupled with feature interaction to extract millimeter-wave radar point cloud features, thereby enhancing the utilization of millimeter-wave radar information and facilitating subsequent fusion. In the fusion module, LSTM is used to store the features from multiple modalities as well as the fused BEV features, which allows for the calculation of feature consistency loss between different modalities and continuity loss for the fused BEV features and historical frames, leading to smoother and more precise feature fusion. Finally, the diffusion model is introduced and the Multi-modal U-Net is proposed for denoising, which improves the robustness of trajectory planning. Extensive experiments are conducted using the CARLA simulator on the authoritative Longest-06 benchmark and Town-05 Long benchmark, getting a DS (Driving Score) of 73.80±1.01 and 73.7±1.3 respectively. The results show that the proposed approach achieves better panoramic autonomous driving with superior performance and flexibility compared to existing methods.

Key words: autonomous driving, multi-sensor fusion, feature interaction, diffusion model