汽车工程 ›› 2024, Vol. 46 ›› Issue (9): 1697-1706.doi: 10.19562/j.chinasae.qcgc.2024.09.017

• • 上一篇    

基于注意力融合特征增强的座舱表情识别模型

罗玉涛1,2(),郭丰瑞1,2   

  1. 1.华南理工大学机械与汽车工程学院,广州 510640
    2.广东省汽车工程重点实验室,广州 510640
  • 收稿日期:2024-02-26 修回日期:2024-04-20 出版日期:2024-09-25 发布日期:2024-09-19
  • 通讯作者: 罗玉涛 E-mail:ctytluo@scut.edu.cn
  • 基金资助:
    工信部制造业高质量发展专项资金项目(R-ZH-023-QT-001-20221009-001);广州市科技计划项目(2023B01J0016)

Cockpit Facial Expression Recognition Model Based on Attention Fusion and Feature Enhancement Network

Yutao Luo1,2(),Fengrui Guo1,2   

  1. 1.School of Mechanical and Automotive Engineering,South China University of Technology,Guangzhou 510640
    2.Guangdong Provincial Key Laboratory of Automotive Engineering,Guangzhou 510640
  • Received:2024-02-26 Revised:2024-04-20 Online:2024-09-25 Published:2024-09-19
  • Contact: Yutao Luo E-mail:ctytluo@scut.edu.cn

摘要:

针对智能座舱驾驶员表情识别深度学习模型准确率和实时性难以兼顾的问题,提出一种基于注意力融合与特征增强网络的表情识别模型EmotionNet。模型以GhostNet为基础,在特征提取模块内利用两个检测分支融合坐标注意力和通道注意力机制,实现注意力机制互补与对重要特征的全方位关注;建立特征增强颈部网络以融合不同尺度特征信息;最终通过头部网络实现不同尺度特征信息决策级融合。在训练中则引入迁移学习思想和中心损失函数以进一步提高模型的识别准确性。在RAF-DB和KMU-FED数据集实验中,模型分别取得85.23%和99.95%识别准确率,并达到59.89 FPS的识别速度。EmotionNet平衡了识别准确率和实时性,达到了较为先进的水平并具备一定的智能座舱表情识别任务的适用性。

关键词: 智能座舱, 表情识别, 注意力机制, 特征增强网络

Abstract:

For the problem of difficulty in balancing accuracy and real-time performance of deep learning models for intelligent cockpit driver expression recognition, an expression recognition model called EmotionNet based on attention fusion and feature enhancement network is proposed. Based on GhostNet, the model utilizes two detection branches within the feature extraction module to fuse coordinate attention and channel attention mechanisms to realize complementary attention mechanisms and all-round attention to important features. A feature enhanced neck network is established to fuse feature information of different scales. Finally, decision level fusion of feature information at different scales is achieved through the head network. In training, transfer learning and central loss function are introduced to improve the recognition accuracy of the model. In the embedded device testing experiments on the RAF-DB and KMU-FED datasets, the model achieves the recognition accuracy of 85.23% and 99.95%, respectively, with a recognition speed of 59.89 FPS. EmotionNet balances recognition accuracy and real-time performance, achieving a relatively advanced level and possessing certain applicability for intelligent cockpit expression recognition tasks.

Key words: intelligent cockpit, expression recognition, attention mechanisms, feature enhancement network