汽车工程 ›› 2024, Vol. 46 ›› Issue (9): 1608-1616.doi: 10.19562/j.chinasae.qcgc.2024.09.008

• • 上一篇    

基于激光雷达点云的动态驾驶场景多任务分割网络

王海1(),李建国1,蔡英凤2,陈龙2   

  1. 1.江苏大学汽车与交通工程学院,镇江 212013
    2.江苏大学汽车工程研究院,镇江 212013
  • 收稿日期:2024-01-23 修回日期:2024-04-23 出版日期:2024-09-25 发布日期:2024-09-19
  • 通讯作者: 王海 E-mail:wanghai1019@163.com
  • 基金资助:
    国家重点研发计划项目(2023YFB2504401)

A LiDAR-Based Dynamic Driving Scene Multi-task Segmentation Network

Hai Wang1(),Jianguo Li1,Yingfeng Cai2,Long Chen2   

  1. 1.School of Automotive and Traffic Engineering,Jiangsu University,Zhenjiang 212013
    2.Automotive Engineering Research Institute,Jiangsu University,Zhenjiang 212013
  • Received:2024-01-23 Revised:2024-04-23 Online:2024-09-25 Published:2024-09-19
  • Contact: Hai Wang E-mail:wanghai1019@163.com

摘要:

在自动驾驶场景理解任务中进行准确的可行驶区域以及动静态物体分割对于后续的局部运动规划和运动控制至关重要。然而当前基于激光雷达点云的通用语义分割方法并不能在车端边缘计算设备上实现实时且鲁棒的预测,且不能预测当前时刻的物体运动状态。为解决该问题本文提出一种可行驶区域及动静态物体多任务分割网络MultiSegNet。该网络利用激光雷达输出的深度图及处理后得到的残差图像作为编码空间特征和运动特征的表征输入到网络用于特征学习,从而避免直接处理无序高密度点云。针对深度图在不同方向视角内目标分布数量差异较大的特点,本文提出了变分辨率分组输入策略。该方法能在降低网络计算量的同时提高网络的分割精度。为适配不同尺度目标所需要的卷积感受野尺寸本文提出了深度值引导的分层空洞卷积模块。同时本文为有效关联并融合不同时域下物体的空间位置和姿态信息提出了时空运动特征增强网络。为验证所提出MultiSegNet的有效性,本文在大规模点云驾驶场景数据集SemanticKITTI及nuScenes上进行验证。结果表明:可行驶区域、静态物体和动态物体的分割IoU分别达到98%、97%和70%,性能优于主流网络,且在边缘计算设备上实现实时推理。

关键词: 无人驾驶, 激光雷达, 多任务点云分割网络, 动态物体分割

Abstract:

In autonomous driving scene understanding task, accurate segmentation of drivable areas, dynamic and static objects is essential for subsequent local motion planning and motion control. However, the current general semantic segmentation method based on lidar point cloud cannot achieve real-time and robust prediction on vehicle-end edge computing devices, and cannot predict the motion state of objects at the current moment. In order to solve this problem, a multi-task segmentation network MultiSegNet for driving areas and dynamic and static objects is proposed in this paper. The network uses the depth map output by the lidar and the processed residual image as the representation of the encoded spatial features and motion features to input to the network for feature learning, so as to avoid directly processing disordered high-density point clouds. For the large difference in the number of target distributions in different directions of the depth map, a variable resolution grouping input strategy is proposed, which can reduce the amount of network computation and improve the segmentation accuracy of the network. In order to adapt to the size of the convolutional receptive field required for targets at different scales, a depth-value-guided hierarchical dilated convolution module is proposed. At the same time, in order to effectively correlate and fuse the spatial position and attitude information of objects in different time domains, a spatiotemporal motion feature enhancement network is proposed. The effectiveness of the proposed MultiSegNet is verified on the large-scale point cloud driving scene datasets SemanticKITTI and nuScenes. The results show that the segmentation IoU of driving area, static object and dynamic object reaches 98%, 97% and 70%, respectively, which is better than that of mainstream networks, with real-time inference realized on edge computing devices.

Key words: autonomous driving, lidar, multi-task point cloud segmentation network, dynamic object segmentation