汽车工程 ›› 2022, Vol. 44 ›› Issue (10): 1503-1510.doi: 10.19562/j.chinasae.qcgc.2022.10.004

所属专题: 智能网联汽车技术专题-感知&HMI&测评2022年

• • 上一篇    下一篇

动态场景下基于语义分割与运动一致性约束的车辆视觉SLAM

黄圣杰1,胡满江1,2,周云水1,2,殷周平1,秦晓辉1,2(),边有钢1,2,贾倩倩3   

  1. 1.汽车车身先进设计制造国家重点实验室,湖南大学机械与运载工程学院,长沙  410082
    2.湖南大学无锡智能控制研究院,无锡  214115
    3.中国汽车工程学会,北京  100000
  • 出版日期:2022-10-25 发布日期:2022-10-21
  • 通讯作者: 秦晓辉 E-mail:qxh880507@163.com
  • 基金资助:
    国家重点研发计划(2021YFB2501800);国家自然科学基金(52172384);长沙市自然科学基金(KQ2202162);汽车车身先进设计制造国家重点实验室自主课题(61775006)

Vehicle Visual SLAM in Dynamic Scenes Based on Semantic Segmentation and Motion Consistency Constraints

Shengjie Huang1,Manjiang Hu1,2,Yunshui Zhou1,2,Zhouping Yin1,Xiaohui Qin1,2(),Yougang Bian1,2,Qianqian Jia3   

  1. 1.State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body,College of Mechanical and Vehicle Engineering,Hunan University,Changsha  410082
    2.Wuxi Intelligent Control Research Institute of Hunan University,Wuxi  214115
    3.China Society of Automotive Engineers,Beijing  100000
  • Online:2022-10-25 Published:2022-10-21
  • Contact: Xiaohui Qin E-mail:qxh880507@163.com

摘要:

传统的车辆同时定位与建图方法多依赖于静态环境假设,在动态场景下易引起位姿估计精度下降甚至前端视觉里程计跟踪失败。本文结合Fast-SCNN实时语义分割网络与运动一致性约束,提出一种动态场景视觉SLAM方法。首先利用Fast-SCNN获取潜在动态目标的分割掩码并进行特征点去除,以获取相机位姿的初步估计;随后基于运动约束与卡方检验完成潜在动态目标中静态点的重添加,以进一步优化相机位姿。验证集测试表明,所训练的语义分割网络平均像素精度和交并比超过90%,单帧图片处理耗时约14.5 ms,满足SLAM系统的分割精度与实时性要求。慕尼黑大学公开数据集和实车数据集测试表明,融合本文算法的ORB-SLAM3部分指标平均提升率超过80%,显著提升了动态场景下的SLAM运行精度与鲁棒性,有助于保障智能车辆的安全性。

关键词: 智能车辆, 同时定位与建图, 语义分割, 动态场景, 运动一致性

Abstract:

Traditional simultaneous localization and mapping (SLAM) methods for vehicles generally rely on the assumption of static environment, so the positional estimation accuracy may be decreased and the front-end visual odometer may even fail to track in dynamic scenes. This paper proposes a SLAM method for dynamic scenes by combining Fast-SCNN real-time semantic segmentation network and motion consistency constraints. Firstly, FAST-SCNN is used to obtain a segmentation mask of potential dynamic targets and remove the feature points to obtain a preliminary estimation of the camera position. Subsequently, based on the motion constraints and the chi-square test, the static points in the potential dynamic target are added again to further optimize the camera pose. The validation set test results show that the average pixel accuracy and mean intersection over union (mIOU) of the proposed semantic segmentation network is greater than 90%, with the processing time for 1 frame of picture is about 14.5 milliseconds, which meets the segmentation accuracy and real-time requirements of the SLAM system. Based on the public data set of TUM and real vehicle data set, the average performance improvement by using the proposed method exceeds by 80% over ORB-SLAM3 in various indicators, which significantly enhances the operating accuracy and robustness of SLAM in dynamic scenes and hence guarantees driving safety of intelligent vehicles.

Key words: intelligent vehicle, simultaneous localization and mapping, semantic segmentation, dynamic scenes, motion consistency