汽车工程 ›› 2025, Vol. 47 ›› Issue (10): 1914-1922.doi: 10.19562/j.chinasae.qcgc.2025.10.007

• • 上一篇    

面向非匹配语义数据的高保真度图像翻译方法

李卓1,曹立波1(),廖家才2,崔昊巍3,张月3   

  1. 1.湖南大学,整车先进设计制造技术全国重点实验室,长沙 410082
    2.长沙理工大学汽车与机械工程学院,长沙 410076
    3.中国北方车辆研究所,北京 100072
  • 收稿日期:2025-03-18 修回日期:2025-04-30 出版日期:2025-10-25 发布日期:2025-10-20
  • 通讯作者: 曹立波 E-mail:hdclb@163.com
  • 基金资助:
    湖南大学整车先进设计制造技术全国重点实验室自主研究课题(72475001)

High-Fidelity Unsupervised Image Translation for Mismatched Semantic Data

Zhuo Li1,Libo Cao1(),Jiacai Liao2,Haowei Cui3,Yue Zhang3   

  1. 1.Hunan University,State Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle,Changsha 410085
    2.College of Mechanical and Vehicle Engineering,Changsha University of Science & Technology,Changsha 410076
    3.North China Vehicle Research Institute,Beijing 100072
  • Received:2025-03-18 Revised:2025-04-30 Online:2025-10-25 Published:2025-10-20
  • Contact: Libo Cao E-mail:hdclb@163.com

摘要:

在智能驾驶环境感知领域,图像翻译模型在源域与目标域语义不匹配时常导致语义翻转和细节丢失问题,为此提出一种面向非对称域数据的高保真图像翻译方法。基于扩散模型的生成器结构,提出多重自适应跳跃连接模块MASC和高维向量一致性损失HVC loss。MASC通过动态归一化与注意力机制,自适应处理跳跃连接中的语义信息与风格信息;HVC loss在高维向量空间约束语义映射关系。相比于最优结果,模型的 FID和KID指标在自制数据集上分别降低了15.36和0.003 4,在公开数据集中降低了1.44和0.000 8。

关键词: 图像翻译, 深度学习, 数据增强, 扩散模型

Abstract:

In the field of intelligent driving environment perception, image translation models often fail when there is a significant semantic mismatch between source and target domains, leading to semantic inversion and detail degradation. To address this challenge, in this paper a high-fidelity image translation method tailored for asymmetric domain data is proposed. Based on the diffusion-model generator structure, a multi-adaptive skip connection (MASC) module and a high-dimensional vector consistency loss (HVC loss) are proposed. The MASC module combines dynamic normalization and attention mechanism to adaptively process semantic and style information in skip connection, while the HVC loss constrains semantic mapping relationship in high-dimensional symbolic space. Compared to the optimal results, the proposed model reduces the FID and KID scores by 15.36 and 0.003 4 on RainSurface, and by 1.44 and 0.000 8 on public datasets, respectively.

Key words: image translation, deep learning, data augmentation, diffusion models