Automotive Engineering ›› 2024, Vol. 46 ›› Issue (1): 29-38.doi: 10.19562/j.chinasae.qcgc.2024.01.004
Previous Articles Next Articles
Biao Yang1,Zhiwen Wei1,Rongrong Ni1,Hai Wang2,Yingfeng Cai3(),Changchun Yang1
Received:
2023-06-04
Revised:
2023-07-03
Online:
2024-01-25
Published:
2024-01-23
Contact:
Yingfeng Cai
E-mail:caicaixiao0304@126.com
Biao Yang, Zhiwen Wei, Rongrong Ni, Hai Wang, Yingfeng Cai, Changchun Yang. Efficient Pedestrian Crossing Intention Anticipation Based on Action-Conditioned Interaction[J].Automotive Engineering, 2024, 46(1): 29-38.
"
模型名称 | 模型 变体 | PIE | JAAD_beh | JAAD_all | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Acc | AUC | F1 | Pre | Rec | Acc | AUC | F1 | Pre | Rec | Acc | AUC | F1 | Pre | Rec | ||
ATGC[ | VGG16 | 0.71 | 0.60 | 0.41 | 0.49 | 0.36 | 0.59 | 0.52 | 0.71 | 0.63 | 0.82 | 0.82 | 0.75 | 0.55 | 0.49 | 0.63 |
ResNet50 | 0.70 | 0.59 | 0.38 | 0.47 | 0.32 | 0.46 | 0.45 | 0.54 | 0.58 | 0.51 | 0.81 | 0.72 | 0.52 | 0.47 | 0.56 | |
ConvLSTM[ | VGG16 | 0.58 | 0.55 | 0.39 | 0.32 | 0.49 | 0.53 | 0.49 | 0.64 | 0.64 | 0.64 | 0.63 | 0.57 | 0.32 | 0.24 | 0.48 |
ResNet50 | 0.54 | 0.46 | 0.26 | 0.23 | 0.29 | 0.59 | 0.55 | 0.69 | 0.68 | 0.70 | 0.63 | 0.58 | 0.33 | 0.25 | 0.49 | |
SingleRNN[ | GRU | 0.83 | 0.77 | 0.67 | 0.70 | 0.64 | 0.58 | 0.54 | 0.67 | 0.67 | 0.68 | 0.65 | 0.59 | 0.34 | 0.26 | 0.49 |
LSTM | 0.81 | 0.75 | 0.64 | 0.67 | 0.61 | 0.51 | 0.48 | 0.61 | 0.63 | 0.59 | 0.78 | 0.75 | 0.54 | 0.44 | 0.70 | |
MultiRNN[ | GRU | 0.83 | 0.80 | 0.71 | 0.69 | 0.73 | 0.61 | 0.50 | 0.74 | 0.64 | 0.86 | 0.79 | 0.79 | 0.58 | 0.45 | 0.79 |
StackedRNN[ | GRU | 0.82 | 0.78 | 0.67 | 0.67 | 0.68 | 0.60 | 0.60 | 0.66 | 0.73 | 0.61 | 0.79 | 0.79 | 0.58 | 0.46 | 0.79 |
HierarchicalRNN[ | GRU | 0.82 | 0.77 | 0.67 | 0.68 | 0.66 | 0.53 | 0.50 | 0.63 | 0.64 | 0.61 | 0.80 | 0.79 | 0.59 | 0.47 | 0.79 |
SFRNN[ | GRU | 0.82 | 0.79 | 0.69 | 0.67 | 0.70 | 0.51 | 0.45 | 0.63 | 0.61 | 0.64 | 0.84 | 0.84 | 0.65 | 0.54 | 0.84 |
C3D[ | 3DConv | 0.77 | 0.67 | 0.52 | 0.63 | 0.44 | 0.61 | 0.51 | 0.75 | 0.63 | 0.91 | 0.84 | 0.81 | 0.65 | 0.57 | 0.75 |
I3D[ | 3DConv | 0.80 | 0.73 | 0.62 | 0.67 | 0.58 | 0.62 | 0.56 | 0.73 | 0.68 | 0.79 | 0.81 | 0.74 | 0.63 | 0.66 | 0.61 |
Optical | 0.81 | 0.83 | 0.72 | 0.60 | 0.90 | 0.62 | 0.51 | 0.75 | 0.65 | 0.88 | 0.84 | 0.80 | 0.63 | 0.55 | 0.73 | |
PCPA[ | 3DConv | 0.86 | 0.91 | 0.78 | 0.69 | 0.89 | 0.50 | 0.47 | 0.59 | 0.61 | 0.58 | 0.70 | 0.85 | 0.51 | 0.36 | 0.87 |
Ped Graph+[ | GCN | 0.89 | 0.90 | 0.81 | 0.83 | 0.79 | 0.70 | 0.70 | 0.76 | 0.77 | 0.75 | 0.86 | 0.88 | 0.65 | 0.58 | 0.75 |
Ours | GCN | 0.90 | 0.87 | 0.82 | 0.86 | 0.78 | 0.69 | 0.65 | 0.77 | 0.73 | 0.81 | 0.89 | 0.84 | 0.79 | 0.86 | 0.73 |
1 | CHEN B, SUN D, ZHOU J, et al. A future intelligent traffic system with mixed autonomous vehicles and human-driven vehicles[J]. Information Sciences, 2020, 529: 59-72. |
2 | 连静, 王欣然, 李琳辉, 等. 基于人-车交互的行人轨迹预测[J]. 中国公路学报, 2021, 34(5): 215. |
LIAN J, WANG X R, LI L H, et al. Pedestrian trajectory prediction based on human-vehicle interaction[J]. China Journal of Highway and Transport, 2021, 34(5): 215. | |
3 | LI J, SU W, WANG Z. Simple pose: rethinking and improving a bottom-up approach for multi-person pose estimation[C]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 11354-11361. |
4 | LUO Z, WANG Z, HUANG Y, et al. Rethinking the heatmap regression for bottom-up human pose estimation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13264-13273. |
5 | LIU J, ROJAS J, LI Y, et al. A graph attention spatio-temporal convolutional network for 3D human pose estimation in video[C]. 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021: 3374-3380. |
6 | LI Y, YANG S, LIU P, et al. SimCC: a simple coordinate classification perspective for human pose estimation[C]. Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part VI. Cham: Springer Nature Switzerland, 2022: 89-106. |
7 | SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5693-5703. |
8 | YU B, YIN H, ZHU Z. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting[J]. arXiv preprint arXiv:, 2017. |
9 | SHI L, ZHANG Y, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 12026-12035. |
10 | YE F, PU S, ZHONG Q, et al. Dynamic GCN: context-enriched topology learning for skeleton-based action recognition[C]. Proceedings of the 28th ACM International Conference on Multimedia, 2020: 55-63. |
11 | 杨彪, 范福成, 杨吉成, 等. 基于动作预测与环境条件的行人过街意图识别[J]. 汽车工程, 2021, 43(7): 1066-1076. |
YANG Biao, FAN Fucheng, YANG Jicheng, et al. Recognizing pedestrians’ crossing intentions based on action prediction and environment context [J]. Automotive Engineering, 2021, 43(7): 1066-1076. | |
12 | CADENA P R G, QIAN Y, WANG C, et al. Pedestrian Graph+: a fast pedestrian crossing prediction model based on graph convolutional networks[J]. IEEE Transactions on Intelligent Transportation Systems, 2022. |
13 | YANG D, ZHANG H, YURTSEVER E, et al. Predicting pedestrian crossing intention with feature fusion and spatio-temporal attention[J]. IEEE Transactions on Intelligent Vehicles, 2022, 7(2): 221-230. |
14 | NI R, YANG B, WEI Z, et al. Pedestrians crossing intention anticipation based on dual‐channel action recognition and hierarchical environmental context[J]. IET Intelligent Transport Systems, 2023, 17(2): 255-269. |
15 | RASOULI A, KOTSERUBA I, TSOTSOS J K. Are they going to cross? a benchmark dataset and baseline for pedestrian crosswalk behavior[C]. Proceedings of the IEEE International Conference on Computer Vision Workshops. 2017: 206-213. |
16 | RASOULI A, KOTSERUBA I, TSOTSOS J K. Pedestrian action anticipation using contextual feature fusion in stacked RNNs[J]. arXiv preprint arXiv:, 2020. |
17 | SHI J, LIU C, ISHI C T, et al. Skeleton-based emotion recognition based on two-stream self-attention enhanced spatial-temporal graph convolutional network[J]. Sensors, 2020, 21(1): 205. |
18 | STERGIOU A, POPPE R. Adapool: exponential adaptive pooling for information-retaining downsampling[J]. IEEE Transactions on Image Processing, 2022, 32: 251-266. |
19 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778. |
20 | CHEN Y, ZHANG Z, YUAN C, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 13359-13368. |
21 | GIRSHICK R. Fast R-CNN[C]. Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448. |
22 | LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125. |
23 | RASOULI A, KOTSERUBA I, KUNIC T, et al. PIE: a large-scale dataset and models for pedestrian intention estimation and trajectory prediction[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 6262-6271. |
24 | ZHOU Z, FAN X, SHI P, et al. R-MSFM: recurrent multi-scale feature modulation for monocular depth estimating[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 12777-12786. |
25 | BOUAZIZI A, HOLZBOCK A, KRESSEL U, et al. Motionmixer: MLP-based 3D human body pose forecasting[J]. arXiv preprint arXiv:, 2022. |
26 | KOTSERUBA I, RASOULI A, TSOTSOS J K. Benchmark for evaluating pedestrian action prediction[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 1258-1268. |
27 | SHI X, CHEN Z, WANG H, et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting[J]. Advances in Neural Information Processing Systems, 2015, 28. |
28 | KOTSERUBA I, RASOULI A, TSOTSOS J K. Do they want to cross? understanding pedestrian intention for behavior prediction[C]. 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2020: 1688-1693. |
29 | BHATTACHARYYA A, FRITZ M, SCHIELE B. Long-term on-board prediction of people in traffic scenes under uncertainty[C].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4194-4202. |
30 | NG J Y, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: deep networks for video classification[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 4694-4702. |
31 | LIN R, LIU S, YANG M, et al. Hierarchical recurrent neural network for document modeling[C]. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 899-907. |
32 | TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. Proceedings of the IEEE International Conference on Computer Vision, 2015: 4489-4497. |
33 | LI J, WANG C, ZHU H, et al. CrowdPose: efficient crowded scenes pose estimation and a new benchmark[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 10863-10872. |
34 | CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? a new model and the kinetics dataset[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6299-6308. |
[1] | Gege Cui,Lü Chao,Jinghang Li,Zheyu Zhang,Guangming Xiong,Jianwei Gong. Data-Driven Personalized Scenario Risk Map Construction for Intelligent Vehicles [J]. Automotive Engineering, 2023, 45(2): 231-242. |
[2] | Bai Zhonghao, Wang Yunyu, Zhang Linwei. Driver Distraction Behavior Detection with Multi-information Fusion Based on Graph Convolution Networks [J]. Automotive Engineering, 2020, 42(8): 1027-1033. |