基于三维锥形栅格的激光点云语义分割方法

doi:10.19562/j.chinasae.qcgc.2022.08.007

摘要/Abstract

摘要：

激光点云语义分割是自动驾驶系统中道路场景感知的重要分支。虽然主流方法将点云转换为规则的二维图像或笛卡尔栅格进行处理，减少因点云非结构化所带来的计算量，但二维图像方法不可避免地改变点云的三维几何拓扑结构，而笛卡尔栅格忽略了室外激光点云的密度不一致性，从而限制了包括行人和自行车等小物体的语义分割能力。因此，本文中提出了一种基于三维锥形栅格和稀疏卷积的激光点云语义分割方法，利用锥形栅格分区解决了点云的稀疏性和密度不一致的问题；为提高模型推理速度，设计了重参数化三维稀疏卷积网络。在SemanticKITTI和nuScenes两个大规模数据集上对所提方法进行评估。结果表明，与目前最新的点云分割方法相比，所提方法的平均交并比分别提升了1.3%和0.8%，尤其对小物体识别有显著的提升。

关键词: 自动驾驶, 激光点云, 语义分割, 三维锥形栅格, 重参数化, 三维稀疏卷积网络

Abstract:

Semantic segmentation of LiDAR point cloud is an important branch of road scene perception in automatic driving system. Though the state-of-the-art methods convert point cloud into regular 2D images or Cartesian grid for processing， which reduces the computation efforts resulting from the unstructured point clouds， but the 2D image-based methods inevitably change the 3D geometric topology， while Cartesian grid-based methods ignore the density inconsistency of outdoor LiDAR point cloud， thus limiting their semantic segmentation ability， especially for small objects such as pedestrians and bicycles. Therefore， a semantic segmentation method for LiDAR point cloud base on 3D conical grid and sparse convolution network （Spconv3D） is proposed in this paper， in which conical grid partition is used to solve the problem of sparsity and density inconsistency of point cloud. The re-parameterized Spconv3D is designed to enhance the speed of model inference. Two large-scale datasets， i.e. SemanticKITTI and nuScenes are used to conduct an evaluation on the method proposed. The results show that compared with the state-of-the-art methods， the mIoU of the method proposed is 1.3% and 0.8% higher respectively， in particular with a significant rise in small object recognition.

Key words: autonomous driving, LiDAR point cloud, semantic segmentation, 3D conical grid, re-parameterization, 3D sparse convolution network

黄润辉,胡立坤,苏鸣方,徐大也,陈奥然. 基于三维锥形栅格的激光点云语义分割方法[J]. 汽车工程, 2022, 44(8): 1173-1182.

Runhui Huang,Likun Hu,Mingfang Su,Daye Xu,Aoran Chen. Semantic Segmentation Method of LiDAR Point Cloud Based on 3D Conical Grid[J]. Automotive Engineering, 2022, 44(8): 1173-1182.

图/表 12

图1

图2

图3

图4

图5

表1

图6

表2

图7

表3

表4

表5

参考文献 28

1	BEHLEY J， GARBADE M， MILIOTO A， et al. SemanticKITTI： a dataset for semantic scene understanding of LIDAR sequences［C］. 2019 IEEE/CVF International Conference on Computer Vision. IEEE， 2019： 9297-9307.
2	CAESAR H， BANKITI V， LANG A H， et al. nuScenes： a multimodal dataset for autonomous driving［C］. 2020 IEEE/CVF International Conference on Computer Vision. IEEE， 2020： 11621-11631.
3	HACKEL T， SAVINOV N， LADICKY L， et al. Semantic3d. net： a new large-scale point cloud classification benchmark［J］. ISPRS Annals of Photogrammetry， Remote Sensing and Spatial Information Sciences， 2017， IV-1/W1： 91–98.
4	WU B， WAN A， YUE X， et al. SqueezeSeg： convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LIDAR point cloud［C］. 2018 IEEE International Conference on Robotics and Automation （ICRA）. IEEE， 2018： 1887-1893.
5	WU B， ZHOU X， ZHAO S， et al. SqueezeSegV2： improved model structure and unsupervised domain adaptation for road-object segmentation from a LIDAR point cloud［C］. 2019 International Conference on Robotics and Automation （ICRA）. IEEE， 2019： 4376-4382.
6	XU C， WU B， WANG Z， et al. SqueezeSegV3： spatially-adaptive convolution for efficient point-cloud segmentation［C］. European Conference on Computer Vision. Springer， Cham， 2020： 1-19.
7	MILIOTO A， VIZZO I， BEHLEY J， et al. RangeNet++： fast and accurate LIDAR semantic segmentation［C］. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. IEEE， 2019： 4213-4220.
8	REDMON J， FARHADI A. YOLOV3： an incremental improvement［J］. arXiv preprint arXiv：， 2018.
9	ZHANG Y， ZHOU Z， DAVID P， et al. PolarNet： an improved grid representation for online LIDAR point clouds semantic segmentation［C］. 2020 IEEE/CVF International Conference on Computer Vision. IEEE， 2020： 9601-9610.
10	MATURANA D， SCHERER S. Voxnet： A 3D convolutional neural network for real-time object recognition［C］. 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. IEEE， 2015： 922-928.
11	GRAHAM B， ENGELCKE M， VAN DER MAATEN L. 3D semantic segmentation with submanifold sparse convolutional networks［C］. 2018 IEEE/CVF International Conference on Computer Vision. IEEE， 2018： 9224-9232.
12	CHOY C， GWAK J Y， SAVARESE S. 4D spatio-temporal convnets： minkowski convolutional neural networks［C］. 2019 IEEE/CVF International Conference on Computer Vision. IEEE， 2019： 3075-3084.
13	ZHU X， ZHOU H， WANG T， et al. Cylindrical and asymmetrical 3D convolution networks for lidar segmentation［C］. 2021 IEEE/CVF International Conference on Computer Vision. IEEE， 2021： 9939-9948..
14	CHENG R， RAZANI R， TAGHAVI E， et al. （AF）2-S3Net： attentive feature fusion with adaptive feature selection for sparse semantic segmentation network［C］. 2021 IEEE/CVF International Conference on Computer Vision. IEEE， 2021： 12547-12556.
15	QI C R， SU H， MO K， et al. PointNet： deep learning on point sets for 3d classification and segmentation［C］. 2017 IEEE/CVF International Conference on Computer Vision. IEEE， 2017： 652-660.
16	QI C R， YI L， SU H， et al. PointNet++： deep hierarchical feature learning on point sets in a metric space［J］. 31st Internation‐al Conference on Neural Information Processing Systems. CurranAssociates Inc：， 2017：5099-5108.
17	THOMAS H， QI C R， DESCHAUD J E， et al. KPConv： flexible and deformable convolution for point clouds［C］. 2019 IEEE/CVF International Conference on Computer Vision. IEEE， 2019： 6411-6420.
18	HU Q， YANG B， XIE L， et al. Randla-net： efficient semantic segmentation of large-scale point clouds［C］. 2020 IEEE/CVF International Conference on Computer Vision. IEEE， 2020： 11108-11117.
19	LIU Y， FAN B， MENG G， et al. Densepoint： learning densely contextual representation for efficient point cloud processing［C］. 2019 IEEE/CVF International Conference on Computer Vision. IEEE， 2019： 5239-5248.
20	HUANG G， LIU Z， VAN DER MAATEN L， et al. Densely connected convolutional networks［C］.2017 IEEE/CVF International Conference on Computer Vision. IEEE， 2017： 4700-4708.
21	DING X， ZHANG X， MA N， et al. Repvgg： making vgg-style convnets great again［C］. 2021 IEEE/CVF International Conference on Computer Vision. IEEE， 2021： 13733-13742.
22	LI J， LIU Y， YUAN X， et al. Depth based semantic scene completion with position importance aware loss［J］. IEEE Robotics and Automation Letters， 2019， 5（1）： 219-226.
23	FAN Y， LYU S， YING Y， et al. Learning with average top-k loss［J］. Advances in Neural Information Processing Systems， 2017， 30.
24	BERMAN M， TRIKI A R， BLASCHKO M B. The lovász-softmax loss： a tractable surrogate for the optimization of the intersection-over-union measure in neural networks［C］. 2018 IEEE/CVF International Conference on Computer Vision. 2018： 4413-4421.
25	CORTINHAL T， TZELEPIS G， ERDAL AKSOY E. SalsaNext： fast， uncertainty-aware semantic segmentation of LiDAR point clouds［C］. International Symposium on Visual Computing. Springer， Cham， 2020： 207-222.
26	ZHANG F，FANG J，WAH B，et al. Deep fusionnet for pointcloud semantic segmentation［C］. Computer Vision-ECCV2020：16th European Conference，Glasgow，UK，August 23–28，2020，Proceedings，Part XXIV 16. Springer International Publishing，2020：644-663.
27	YAN X， GAO J， LI J， et al. Sparse single sweep LIDAR point cloud segmentation via learning contextual shape priors from scene completion［J］. arXiv preprint arXiv：， 2020.
28	LIONG V E， NGUYEN T N T， WIDJAJA S， et al. AMVNet： assertion-based multi-view fusion network for LIDAR semantic segmentation［J］. arXiv preprint arXiv：， 2020.

算法	mIoU/ %
Darknet53^［1］	49.9	86.4	24.5	32.7	25.5	22.6	36.2	33.6	4.7	91.8	64.8	74.6	27.9	84.1	55.0	78.3	50.1	64.0	38.9	52.2
RandLA-Net^［18］	50.3	94.0	19.8	21.4	42.7	38.7	47.5	48.8	4.6	90.4	56.9	67.9	15.5	81.1	49.7	78.3	60.3	59.0	44.2	38.1
RangeNet++^［7］	52.2	91.4	25.7	34.4	25.7	23.0	38.3	38.8	4.8	91.8	65.0	75.2	27.8	87.4	58.6	80.5	55.1	64.6	47.9	55.9
PolarNet^［9］	54.3	93.8	40.3	30.1	22.9	28.5	43.2	40.2	5.6	90.8	61.7	74.4	21.7	90.0	61.3	84.0	65.5	67.8	51.8	57.5
MinkNet42^［12］	54.3	94.3	23.1	26.2	26.1	26.7	43.1	36.4	7.9	91.1	63.8	69.7	29.3	92.7	57.1	83.7	68.4	64.7	57.3	60.1
KPConv^［17］	58.8	92.5	38.7	36.5	29.6	33.0	45.6	46.2	20.1	91.7	63.4	74.8	26.4	89.0	59.4	82.0	58.7	65.4	49.6	58.9
Salsanex^［25］	59.5	91.9	48.3	38.6	38.9	31.9	60.2	59.0	19.4	91.7	63.7	75.8	29.1	90.2	64.2	81.8	63.6	66.5	54.3	62.1
FusionNet^［26］	61.3	95.3	47.5	37.7	41.8	34.5	59.5	56.8	11.9	91.8	68.8	77.1	30.8	92.5	69.4	84.5	69.8	68.5	60.4	66.5
Cylinder3D^［13］	67.8	97.1	67.6	64.0	59.0	58.6	73.9	67.9	36.0	91.4	65.1	75.5	32.3	91.0	66.5	85.4	71.8	68.5	62.6	65.6
（AF）2-S3Net^［14］	69.7	94.5	65.4	86.8	39.2	41.1	80.7	80.4	74.3	91.3	68.8	72.5	53.5	87.9	63.2	70.2	68.5	53.7	61.5	71.0
本文算法	71.0	97.3	73.5	72.1	49.3	58.5	79.8	82.8	23.6	92.9	73.0	79.7	27.1	91.8	68.5	86.9	75.8	72.0	70.0	75.1

算法	模型参数/10⁶	内存消耗/Gb	推理时间/ms	mIoU/ %
PolarNet	13.0	7.7	160	58.2
MinkNet42	21.7	4.2	114	61.1
Cylinder3D	53.3	3.4	295	66.9
本文算法（未转换）	103.7	4.7	170	70.5
本文算法（转换）	100.7	4.5	159	70.5

算法	mIoU/%
PolarNet^［9］	69.4	72.2	16.8	77.0	86.5	51.1	69.7	64.8	54.1	69.7	63.5	96.6	67.1	77.7	72.1	87.1	84.5
JS3C-Net^［27］	73.6	80.1	26.2	87.8	84.5	55.2	72.6	71.3	66.3	76.8	71.2	96.8	64.5	76.9	74.1	87.5	86.1
Cylinder3D^［13］	77.2	82.8	29.8	84.3	89.4	63.0	79.3	77.2	73.4	84.6	69.1	97.7	70.2	80.3	75.5	90.4	87.6
AMVNet^［28］	77.4	80.6	32.0	81.7	88.9	67.1	84.3	76.1	73.5	84.9	67.3	97.5	67.4	79.4	75.5	91.5	88.7
本文算法	78.2	82.2	34.7	84.0	87.5	71.4	83.2	78.9	74.2	85.0	68.3	97.4	68.7	79.8	75.9	91.6	88.6

基准模型	锥形分区	Rep- Spconv3d	Top-k 损失函数	Geo-aware 损失函数	mIoU/%
√					60.6
√	√				63.0
√	√	√			65.8
√	√	√	√		67.9
√	√	√	√	√	70.5

Spconv3D 3×3×3	Spconv3D 1×1×1	Identity层	mIoU/%	推理时间（模型未转换）/ms
√			68.4	137
√	√		69.1	148
√		√	68.5	146
√	√	√	70.5	149