汽车工程 ›› 2023, Vol. 45 ›› Issue (7): 1112-1122.doi: 10.19562/j.chinasae.qcgc.2023.07.002

所属专题: 智能网联汽车技术专题-感知&HMI&测评2023年

• 专题:汽车智能化关键技术 • 上一篇    下一篇

基于级联YOLOv7的自动驾驶三维目标检测

赵东宇,赵树恩()   

  1. 重庆交通大学机电与车辆工程学院,重庆  400074
  • 收稿日期:2022-12-14 修回日期:2023-01-24 出版日期:2023-07-25 发布日期:2023-07-25
  • 通讯作者: 赵树恩 E-mail:zse0916@163.com
  • 基金资助:
    国家自然科学基金项目(52072054);重庆市技术创新与应用发展专项重点项目(cstc2021jscx-cylh0026);汽车主动安全测试技术重庆市工业和信息化重点实验室开放基金(2021KFKT01)

Autonomous Driving 3D Object Detection Based on Cascade YOLOv7

Dongyu Zhao,Shuen Zhao()   

  1. School of Mechatronics and Vehicle Engineering,Chongqing Jiaotong University,Chongqing  400074
  • Received:2022-12-14 Revised:2023-01-24 Online:2023-07-25 Published:2023-07-25
  • Contact: Shuen Zhao E-mail:zse0916@163.com

摘要:

针对图像和原始点云三维目标检测方法中存在特征信息残缺及点云搜索量过大的问题,以截体点网(frustum PointNet, F-PointNet)结构为基础,融合自动驾驶周围场景RGB图像信息与点云信息,提出一种基于级联YOLOv7的三维目标检测算法。首先构建基于YOLOv7的截体估计模型,将RGB图像目标感兴趣区域(region of interest, RoI)纵向扩展到三维空间,然后采用PointNet++对截体内目标点云与背景点云进行分割。最终利用非模态边界估计网络输出目标长宽高、航向等信息,对目标间的自然位置关系进行解释。在KITTI公开数据集上测试结果与消融实验表明,级联YOLOv7模型相较基准网络,推理耗时缩短40 ms/帧,对于在遮挡程度为中等、困难级别的目标检测平均精度值提升了8.77%、9.81%。

关键词: 三维目标检测, YOLOv7, F-PointNet, 多传感器信息融合, 自动驾驶

Abstract:

For the problems of incomplete feature information and excessive point cloud search volume in 3D object detection methods based on image and original point cloud, based on Frustum PointNet structure, a 3D object detection algorithm based on cascade YOLOv7 is proposed by fusing RGB image and point cloud data of autonomous driving surrounding scenes. Firstly, a frustum estimation model based on YOLOv7 is constructed to longitudinally expand the RGB image RoI into 3D space. Then the object point cloud and background point cloud in the frustum are segmented by PointNet ++. Finally, the natural position relationship between objects is explained by using the non-modal 3D box estimation network to output the length, width, height, heading et al. of objects. The test results and ablation experiments on the KITTI public dataset show that compared with the benchmark network, the inference time of cascade YOLOv7 model is shortened by 40 ms?frame-1, with the mean average precision of detection at the moderate, difficulty level increased by 8.77%, 9.81%, respectively.

Key words: 3D object detection, YOLOv7, F-PointNet, multi-sensor information fusion, autonomous driving