基于检测的高效自动驾驶实例分割方法

doi:10.19562/j.chinasae.qcgc.2023.04.002

摘要/Abstract

摘要：

基于深度学习的实例分割算法在大规模通用场景中取得了良好的分割性能，然而面向复杂交通场景的多目标实例分割仍然极具挑战性，尤其在算法的高精度和较快推理速度的权衡方面，而这对于智能汽车的行驶安全性至关重要。鉴于此，本文以实时性算法Orienmask为基础，提出了一种基于单阶段检测算法的多头实例分割框架。具体来说，所提框架由骨干网络、特征融合模块和多头掩码构建模块组成。首先，本文通过在骨干网络中加入残差结构获取更加完整的高维表征信息。其次，为了产生更具判别性的特征表达，本文通过引入自校正卷积重构特征金字塔，并使用全局注意力机制改善信息传播以进一步优化所提框架的特征融合模块。最后，提出的多头掩码构建机制通过细化场景目标尺寸分布显著提高不同目标的分割性能。本文算法在开源数据集BDD100k上进行大量测试与验证，分别在边界框和掩码上获得了23.3% 和19.4%的均交并比（mAP@0.5：0.95），与基线方法相比，平均指标提高了5.2%和2.2%。同时在基于自主搭建的实车平台上进行的道路实验也证明本算法能够较好地适应真实驾驶环境，且满足实时性分割需求。

关键词: 自动驾驶, 深度学习, 目标检测, 实例分割

Abstract:

The instance segmentation algorithm based on deep learning has achieved excellent performance in large-scale general scenarios. However， the segmentation of multi-objective instances for complex traffic scenes is still challenging， especially in the balance between high accuracy and fast inference speed， which is crucial to driving safety of intelligent vehicles. In view of this， based on the real-time algorithm Orienmask， a multi-head segmentation framework is proposed based on the one-stage detection method. Specifically， the proposed framework comprises of a backbone， a feature fusion module and a multi-head mask construction module. Firstly， complete high-dimensional feature maps are obtained by adding residual structures to the backbone.Secondly， in order to generate discriminative feature representations， the feature pyramid module is reconstructed by introducing in self-calibrate convolutions and the information propagation path is improved by global attention mechanism， so as to further optimize the feature fusion module of the proposed framework. Finally， a multi-head mask construction mechanism is proposed to significantly improve the segmentation performance of different targets by refining the size distribution of instances in the traffic scenes. The proposed algorithm has been tested and validated on the open-source dataset BDD100k， and has achieved an average intersection ratio of 23.3% and 19.4% （mAP@0.5：0.95） on bounding boxes and segmentation masks， respectively. Compared with the baseline， the average index are increased by 5.2 % and 2.2 %. At the same time， the road experiment on the self-built real-vehicle platform also proves that the proposed algorithm can adapt to actual driving environments and meet the demands of real-time segmentations.

Key words: autonomous vehicles, deep Learning, object detection, instance segmentation

陈妍妍,王海,蔡英凤,陈龙,李祎承. 基于检测的高效自动驾驶实例分割方法[J]. 汽车工程, 2023, 45(4): 541-550.

Yanyan Chen,Hai Wang,Yingfeng Cai,Long Chen,Yicheng Li. Efficient Automatic Driving Instance Segmentation Method Based on Detection[J]. Automotive Engineering, 2023, 45(4): 541-550.

图/表 13

图1

图2

图3

图4

图5

表1

图6

表2

图7

表3

图8

图9

图10

参考文献 39

1	范丽丽，赵宏伟，赵浩宇，等. 基于深度卷积神经网络的目标检测研究综述［J］. 光学精密工程， 2020， 28： 1152-1164.
	FAN Lili， ZHAO Hongwei， ZHAO Haoyu， et al. Survey of target detection based ondeep convolutional neural networks［J］. Optics and Precision Engineering， 2020，28： 1152-1164.
2	GARCIA-GARCIA A， ORTS-ESCOLANO S， OPREA S， et al. A review on deep learning techniques applied to semantic segmentation ［J］. arXiv：170406857. 2017.
3	王海，蔡柏湘，蔡英凤，等. 基于语义分割网络的路面积水与湿滑区域检测［J］. 汽车工程， 2021， 43： 485-491.
	WANG Hai， CAI Baixiang， CAI Yingfeng， et al. Detection of water⁃covered and wet areas on road pavement based on semantic segmentation network［J］. Automotive Engineering， 2021， 43： 485-491.
4	CAI Y， DAI L， WANG H， et al. Pedestrian motion trajectory prediction in intelligent driving from far shot first-person perspective video ［J］. IEEE Transactions on Intelligent Transportation Systems， 2021， 23（6）： 5298-5313.
5	GU W， BAI S， KONG L. A review on 2D instance segmentation based on deep neural networks ［J］. Image Vision Computing， 2022： 104401.
6	苏丽，孙雨鑫，苑守正. 基于深度学习的实例分割研究综述［J］. 智能系统学报， 2021， 17： 16-31.
	SU Li， SUN Yuxin， YUAN Shouzheng. A survey of instance sementation research based on deep learning［J］.CAAI Transactions on Intelligent Systems， 2021，17： 16-31.
7	GAO N， SHAN Y， WANG Y， et al. SSAP： single-shot instance segmentation with affinity pyramid［C］. Proceedings of the IEEE/CVF International Conference on Computer Vision， F， 2019.
8	DE BRABANDERE B， NEVEN D， VAN GOOL L. Semantic instance segmentation with a discriminative loss function ［J］. arXiv：170802551. 2017.
9	LIU S， JIA J， FIDLER S， et al. SGN： sequential grouping networks for instance segmentation［C］. Proceedings of the IEEE International Conference on Computer Vision， F， 2017.
10	XIE E， SUN P， SONG X， et al. PolarMask： single shot instance segmentation with polar representation［C］. Proceedings of the Proceedings of the IEEE/CVF conference On Computer Vision And Pattern Recognition， F， 2020.
11	DONG B， ZENG F， WANG T， et al. SOLQ： segmenting objects by learning queries ［J］. Advances in Neural Information Processing Systems， 2021， 34.
12	WANG X， KONG T， SHEN C， et al. SOLO： segmenting objects by locations［C］. Proceedings of the European Conference on Computer Vision， F， 2020. Springer.
13	WANG X， ZHANG R， KONG T， et al. Solov2： dynamic and fast instance segmentation ［J］. Advances in Neural Information Processing Systems， 2020， 33： 17721-17732.
14	TIAN Z， SHEN C， CHEN H， et al. FCOS： fully convolutional one-stage object detection［C］. Proceedings of the IEEE/CVF International Conference on Computer Vision， F， 2019.
15	REDMON J， FARHADI A. Yolov3： an incremental improvement ［J］. arXiv： 180402767 2018.
16	BOCHKOVSKIY A， WANG C Y， LIAO H Y M J A P A. Yolov4： optimal speed and accuracy of object detection ［J］. arXiv：200410934. 2020.
17	蔡英凤，张田田，王海，等. 基于实例分割和自适应透视变换算法的多车道线检测［J］. 东南大学学报（自然科学版）， 2020， 50： 775-781.
	CAI Yingfeng， ZHANG Tiantian，WANG Hai， et al. Multi-lane detection based on instance segmentation and adaptive perspective transformation［J］. Journal of Southeast University （Natural Science Edition）， 2020，50：775-781.
18	TIAN Z， SHEN C， CHEN H. Conditional convolutions for instance segmentation［C］. Proceedings of the European Conference on Computer Vision， F， 2020. Springer.
19	HE K， GKIOXARI G， DOLLáR P， et al. Mask R-CNN［C］. Proceedings of the IEEE International Conference on Computer Vision （ICCV）， F March 01， 2017， 2017.
20	HUANG Z， HUANG L， GONG Y， et al. Mask scoring R-CNN［C］. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition， F， 2019.
21	KIRILLOV A， WU Y， HE K， et al. PointRend： image segmentation as rendering［C］. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition， F， 2020.
22	BOLYA D， ZHOU C， XIAO F， et al. YOLACT： real-time instance segmentation［C］. Proceedings of the IEEE/CVF International Conference on Computer Vision， F， 2019.
23	BOLYA D， ZHOU C， XIAO F， et al. YOLACT++： better real-time instance segmentation ［J］. IEEE Transactions on Pattern Analysis Machine Intelligence， 2020.
24	HU J， CAO L， LU Y， et al. ISTR： end-to-end instance segmentation with transformers ［J］. arXiv：210500637. 2021.
25	FANG Y， YANG S， WANG X， et al. Instances as queries［C］. Proceedings of the IEEE/CVF International Conference on Computer Vision， F， 2021.
26	GU W， BAI S， KONG L. A review on 2D instance segmentation based on deep neural networks ［J］. Image and Vision Computing， 2022， 120： 104401.
27	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks ［J］. Advances in Neural Information Processing Systems， 2015， 28.
28	CAI Z， VASCONCELOS N. Cascade R-CNN： high quality object detection and instance segmentation ［J］. IEEE Transactions on Pattern Analysis Machine Intelligence， 2019， 43（5）： 1483-1498.
29	CHEN H， SUN K， TIAN Z， et al. BlendMask： top-down meets bottom-up for instance segmentation［C］. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition， F， 2020.
30	DU W， XIANG Z， CHEN S， et al. Real-time instance segmentation with discriminative orientation maps［C］. Proceedings of the IEEE/CVF International Conference on Computer Vision， F， 2021.
31	LIN T Y， DOLLáR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition， F， 2017.
32	LIU J J， HOU Q， CHENG M M， et al. Improving convolutional networks with self-calibrated convolutions［C］. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition， F， 2020.
33	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition， F， 2018.
34	CAI Y， LUAN T， GAO H， et al. YOLOv4-5D： an effective and efficient object detector for autonomous driving［J］. IEEE Transactions on Instrumentation and Measurement， 2021， 70： 1-13.
35	YU F， XIAN W， CHEN Y， et al. BDD100K： a diverse driving video database with scalable annotation tooling ［J］. arXiv：1805.0487，201.
36	ZHAO H， SHI J， QI X， et al. Pyramid scene parsing network［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition， F， 2017.
37	JAIN A K. Data clustering： 50 years beyond K-means ［J］. Pattern Recognition Letters， 2010， 31（8）： 651-66.
38	DENG J， DONG W， SOCHER R， et al. ImageNet： a large-scale hierarchical image database［C］. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition， F， 2009.
39	CAO Y， XU J， LIN S， et al. GCNet： non-local networks meet squeeze-excitation networks and beyond［C］. Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops， F， 2019.

层数	锚框0	锚框1	锚框2
第1层	（8，10）	（14，12）	（17，20）
第2层	（14，37）	（36，15）	（25，29）
第3层	（34，44）	（67，30）	（47，69）
第4层	（78，98）	（127，175）	（271，337）

方法	行人	骑行者	轿车	货车	巴士	火车	摩托	自行车	mAP@0.5：0.95 （seg）	mAP@0.5：0.95 （box）	FPS
Mask R-CNN	27.6	6.3	43.9	20.8	23.1	0.0	2.0	5. 9	16.2	22.3	14.3
Cascade Mask	28.8	7.3	45.4	24.7	27.1	0.0	9.9	5.6	18.6	25.9	13.2
GCNet	28.0	4.35	43.9	22.4	20.9	0.0	5.09	4.16	16.1	22.4	13.9
YoLACT									15.4	18.5	20.0
Solov2-Lite	19.2	7.2	35.9	19.9	26.8	0.0	15.6	5.4	16.2		36.7
baseline	16.3	5.3	38.4	25.2	26.2	0.0	22.4	3.9	17.2	18.1	38.56
Ours	19.2	4.0	43.3	30.8	26.8	11.9	16.1	3.3	19.4	23.3	27.7

方法	CSP	FPN （Sconv）	PANet （Sconv）	Multi-head	mAP@0.5：0.95 （box）	mAP@0.5 （box）	mAP@0.5：0.95 （seg）	mAP@0.5 （seg）
baseline					18.1	35.3	17.2	31.9
+	√				18.2	35.3	17.2	31.9
+	√	√			18.6	35.9	17.7	32.7
+	√	√	√		20.8	38.4	18.4	34.2
+	√	√	√	√	23.3	41.0	19.4	36.0

[1]	付新科,蔡英凤,陈龙,王海,刘擎超. 不确定性环境下的自动驾驶汽车行为决策方法[J]. 汽车工程, 2024, 46(2): 211-221.
[2]	程腾,倪昊,张强,王文冲,石琴. 基于虚拟点云的二阶段多模态融合网络[J]. 汽车工程, 2024, 46(2): 222-229.
[3]	赵晓聪,房世玉,李子睿,孙剑. 社会性驾驶交互关键效用析取与应用[J]. 汽车工程, 2024, 46(2): 230-240.
[4]	高泽, 楚遵康, 石稼晟, 林滏, 饶卫雄, 余海燕. 基于图网络的汽车零部件应力场快速预测方法研究[J]. 汽车工程, 2024, 46(1): 170-178.
[5]	马雷, 杨顺清, 王欢欢, 翟家琛, 徐健傲. 融合图像显著性特征的轻量级目标检测算法[J]. 汽车工程, 2024, 46(1): 84-91.
[6]	马艳丽, 秦钦, 董方琦, 娄艺苧. 基于风险场的不同认知次任务下接管风险评估模型[J]. 汽车工程, 2024, 46(1): 9-17.
[7]	刘卫国,项志宇,刘伟平,齐道新,王子旭. 基于分布式强化学习的车辆控制算法研究[J]. 汽车工程, 2023, 45(9): 1637-1645.
[8]	刘卫国,项志宇,刘锐,李国栋,王子旭. 基于深度学习的端到端车辆运动规划方法研究[J]. 汽车工程, 2023, 45(8): 1343-1352.
[9]	王明,唐小林,杨凯,李国法,胡晓松. 考虑预测风险的自动驾驶车辆运动规划方法[J]. 汽车工程, 2023, 45(8): 1362-1372.
[10]	朱向雷,吴志新,张宇飞,赵帅,李克秋,孙博华. 基于场景降维及采样方法的场景库优化方法研究[J]. 汽车工程, 2023, 45(8): 1408-1416.
[11]	吴新政,邢星宇,刘力豪,沈勇,陈君毅. 基于错误注入的决策规划系统抗扰性测试与分析[J]. 汽车工程, 2023, 45(8): 1428-1437.
[12]	高锋,冯德福,胡秋霞. 面向NMPC运动规划系统的数值优化加速技术[J]. 汽车工程, 2023, 45(8): 1438-1447.
[13]	芦涛,金馨,廖毅霏,黄圣杰,杨依琳,谢国涛,秦晓辉. 基于雅克比域零空间边缘化的视觉SLAM[J]. 汽车工程, 2023, 45(8): 1457-1467.
[14]	伍文广,田双岳,张志勇,张斌. 非铺装道路凹凸不平特征语义分割方法研究[J]. 汽车工程, 2023, 45(8): 1468-1478.
[15]	林程, 汪博文, 吕沛原, 宫新乐, 于潇. 面向变曲率道路的自动驾驶汽车换道博弈运动规划与协同控制研究[J]. 汽车工程, 2023, 45(7): 1099-1111.