汽车工程 ›› 2024, Vol. 46 ›› Issue (2): 222-229.doi: 10.19562/j.chinasae.qcgc.2024.02.004

• • 上一篇    下一篇

基于虚拟点云的二阶段多模态融合网络

程腾1,2,3(),倪昊1,2,3,张强1,2,3,4,王文冲4,石琴1,2,3   

  1. 1.合肥工业大学,自动驾驶汽车安全技术安徽省重点实验室,合肥 230009
    2.安徽省智慧交通车路协同工程研究中心,合肥 230000
    3.合肥工业大学汽车与交通工程学院,合肥 230000
    4.奇瑞汽车股份有限公司,芜湖 241000
  • 收稿日期:2023-05-10 修回日期:2023-07-30 出版日期:2024-02-25 发布日期:2024-02-23
  • 通讯作者: 程腾 E-mail:cht616@hfut.edu.cn
  • 基金资助:
    国家自然科学基金(82171012);安徽省自然科学基金(2208085MF171);安徽省新能源汽车暨智能网联汽车创新工程项目(JZ2021AFKJ0002);汽车标准化公益性开放课题(CATARC-Z-2022-01350)

Two-Stage Multimodal Fusion Networks Based on Virtual Point Clouds

Teng Cheng1,2,3(),Hao Ni1,2,3,Qiang Zhang1,2,3,4,Wenchong Wang4,Qin Shi1,2,3   

  1. 1.Hefei University of Technology,Key Laboratory for Automated Vehicle Safety Technology of Anhui Province,Hefei  230009
    2.Engineering Research Center for Intelligent Transportation and Cooperative Vehicle-Infrastructure of Anhui Province,Hefei  230000
    3.School of Automotive and Transportation Engineering,Hefei University of Technology,Hefei  230000
    4.Chery Automobile Co. ,Ltd. ,Wuhu  241000
  • Received:2023-05-10 Revised:2023-07-30 Online:2024-02-25 Published:2024-02-23
  • Contact: Teng Cheng E-mail:cht616@hfut.edu.cn

摘要:

针对点云的稀疏性和无序性对目标检测准确率的影响,本文提出了一种基于虚拟点云的二阶段多模态融合网络VPC-VoxelNet。首先,利用图像检测目标信息构造虚拟点云,增加点云的密集程度,从而提高目标特征的表现;其次,增加点云特征维度以区分真实和虚拟点云,并使用含置信度编码的体素,增强点云的相关性;最后,采用虚拟点云的比例系数设计损失函数,增加图像检测有监督训练,提高二阶段网络训练效率,避免二阶段端到端网络模型存在的模型误差累计问题。该目标检测网络VPC-VoxelNet在KITTI数据集上进行了测试,检测精度优于经典三维点云检测网络和某些多传感器信息融合网络,车辆检测精度达到了86.9%。

关键词: 目标检测, 多模态感知, 虚拟点云, 损失函数

Abstract:

To address the impact of sparsity and disorder of point clouds on target detection accuracy, a two-stage multimodal fusion network VPC-VoxelNet based on virtual point clouds is proposed in this paper. Firstly, virtual point clouds are constructed using image detection target information to increase the density of point clouds, thus improving the performance of target features. Secondly, the dimensionality of point cloud features is increased to distinguish real and virtual point clouds, and a voxel with confidence encoding is used to enhance the correlation of point clouds. Finally, the scale factor of the virtual point clouds is adopted to design the loss function to increase the supervised training of image detection and improve the training efficiency of the two-stage network, and avoid the cumulative model error problem of the two-stage end-to-end network model. The target detection network, VPC-VoxelNet, is tested on the KITTI dataset, and the detection accuracy is better than that of the classical 3-dimensional point cloud detection network and certain multi-sensor information fusion networks, with a vehicle detection accuracy of 86.9%.

Key words: target detection, multimodal perception, virtual point cloud, loss function