汽车工程 ›› 2023, Vol. 45 ›› Issue (9): 1617-1625.doi: 10.19562/j.chinasae.qcgc.2023.09.010

所属专题: 智能网联汽车技术专题-感知&HMI&测评2023年

• • 上一篇    下一篇

基于局部窗口交叉注意力的轻量型语义分割

金祖亮1,隗寒冰1(),Liu Zheng1,2,娄路1,郑国峰1   

  1. 1.重庆交通大学机电与车辆工程学院,重庆  400074
    2.University of British Columbia Okanagan,Kelowna,BC,Canada
  • 收稿日期:2022-11-28 修回日期:2023-01-03 出版日期:2023-09-25 发布日期:2023-09-23
  • 通讯作者: 隗寒冰 E-mail:hbwei@cqjtu.edu.cn
  • 基金资助:
    国家自然科学基金(52172381)

Lightweight Semantic Segmentation Method Based on Local Window Cross Attention

Zuliang Jin1,Hanbing Wei1(),Liu Zheng1,2,Lu Lou1,Guofeng Zheng1   

  1. 1.School of Electromechanical and Vehicle Engineering,Chongqing Jiaotong University,Chongqing  400074
    2.University of British Columbia Okanagan,Kelowna,BC,Canada
  • Received:2022-11-28 Revised:2023-01-03 Online:2023-09-25 Published:2023-09-23
  • Contact: Hanbing Wei E-mail:hbwei@cqjtu.edu.cn

摘要:

在自动驾驶汽车环境感知任务中,采用环视相机在统一鸟瞰图(bird's eye view,BEV)坐标系下对车道线、车辆等目标进行语义分割受到广泛关注。针对相机个数增加致使任务推理延迟线性上升、实时性难以完成语义分割任务的难题,本文提出一种基于局部窗口交叉注意力的轻量型语义分割方法。采用改进型EdgeNeXt骨干网络提取特征,通过构建BEV查询和图像特征之间的局部窗口交叉注意力,完成对跨相机透视图之间的特征查询,然后对融合后的BEV特征图通过上采样残差块对BEV特征解码,以得到BEV语义分割结果。在nuScenes公开数据集上的实验结果表明,该方法在BEV地图静态车道线分割任务中平均IoU达到35.1%,相较于表现较好的HDMapNet提高2.2%,推理速度相较于较快的GKT提高58.2%,帧率达到106 FPS。

关键词: 鸟瞰图, 语义分割, 局部窗口, 交叉注意力

Abstract:

For the environmental perception of autonomous vehicle, the application of circumnavigation cameras in the Bird's Eye View (BEV) coordinate for semantic segmentation of lanes, vehicles and other targets has attracted wide attention. For the problems of linear increase of task inference delay due to the increasing number of cameras as well as difficulty in completing semantic segmentation tasks in real-time in autonomous driving perception, this paper proposes a lightweight semantic segmentation method based on local window cross-attention. The model adopts the improved EdgeNeXt backbone network to extract features. By constructing the local window cross attention between BEV query and image features, the feature query between the cross-camera perspectives is constructed. Then, the fused BEV feature map is decoded by up sampling residual block to obtain the BEV semantic segmentation results. The experimental results on the nuScenes dataset show that the proposed method achieves 35.1% mIoU in the lane static segmentation task of BEV map, which is 2.2% higher than that of HDMapNet. In particular, the inference speed increases by 58.2% compared with that of GKT, with the frame detection rate reaching 106 FPS.

Key words: BEV, semantic segmentation, local window, cross-attention