汽车工程 ›› 2023, Vol. 45 ›› Issue (9): 1499-1515.doi: 10.19562/j.chinasae.qcgc.2023.ep.006
所属专题: 智能网联汽车技术专题-规划&决策2023年
• • 下一篇
李升波(),占国建,蒋宇轩,兰志前,张宇航,邹文俊,陈晨,成波,李克强
收稿日期:
2023-02-13
修回日期:
2023-03-16
出版日期:
2023-09-25
发布日期:
2023-09-23
通讯作者:
李升波
E-mail:lishbo@tsinghua.edu.cn
基金资助:
Shengbo Eben Li(),Guojian Zhan,Yuxuan Jiang,Zhiqian Lan,Yuhang Zhang,Wenjun Zou,Chen Chen,Bo Cheng,Keqiang Li
Received:
2023-02-13
Revised:
2023-03-16
Online:
2023-09-25
Published:
2023-09-23
Contact:
Shengbo Eben Li
E-mail:lishbo@tsinghua.edu.cn
摘要:
作为高级别自动驾驶的下一代技术方向,类脑学习以深度神经网络为策略载体,以强化学习为训练手段,通过与环境的交互探索实现策略的自我进化,最终获得从环境状态到执行动作的最优映射。目前,类脑学习方法主要用于自动驾驶的决策与控制功能设计,它的关键技术包括:界定策略设计的系统框架、支持交互训练的仿真平台、决定策略输入的状态表征、定义策略目标的评价指标以及驱动策略更新的训练算法。本文重点梳理了自动驾驶决策控制的发展脉络,包括两类模块化架构(分层式和集成式)和3种技术方案(专家规则型、监督学习型和类脑学习型);概述了当前主流的自动驾驶仿真平台;分析了类脑决控的3类环境状态表征方法(目标式、特征式和组合式);同时介绍了自动驾驶汽车的五维度性能评价指标(安全性、合规性、舒适性、通畅性与经济性);然后详述了用于车云协同训练的典型强化学习算法及其应用现状;最后总结了类脑自动驾驶技术的问题挑战与发展趋势。
李升波,占国建,蒋宇轩,兰志前,张宇航,邹文俊,陈晨,成波,李克强. 类脑学习型自动驾驶决控系统的关键技术[J]. 汽车工程, 2023, 45(9): 1499-1515.
Shengbo Eben Li,Guojian Zhan,Yuxuan Jiang,Zhiqian Lan,Yuhang Zhang,Wenjun Zou,Chen Chen,Bo Cheng,Keqiang Li. Key Technologies of Brain-Inspired Decision and Control Intelligence for Autonomous Driving Systems[J]. Automotive Engineering, 2023, 45(9): 1499-1515.
表3
类脑决控典型案例"
文献 | 架构 | 驾驶任务 | 仿真软件 | 状态表征 | 训练算法 | 实车验证 |
---|---|---|---|---|---|---|
Lillicrap等[ | E2E | 封闭赛道 | TORCS | 特征式 | DDPG | |
Chen等[ | E2E | 两车道环岛 | CARLA | 特征式 | SAC, TD3 | |
Li等[ | E2E | 信控交叉口 | MetaDrive | 目标式 | PPO, SAC | |
Duan等[ | E2E | 多车道 | LasVSim | 组合式 | DSAC | 是 |
Hoel等[ | HDC | 换道决策 | 目标式 | DQN | ||
Yurtsever等[ | HDC | 运动控制 | CARLA | 特征式 | DQN | |
Liu等[ | HDC | 运动控制 | 目标式 | RMPC | ||
Guan等[ | IDC | 交叉路口 | LasVSim | 目标式 | ADP | 是 |
Gu等[ | IDC | 高速多车道 | 组合式 | SAC | ||
Ren等[ | IDC | 信控交叉口 | LasVSim | 组合式 | ADP |
1 | 李升波, 关阳, 侯廉, 等. 深度神经网络的关键技术及其在自动驾驶领域的应用[J]. 汽车安全与节能学报, 2019, 10(2): 119-145. |
LI S E, GUAN Y, HOU L, et al. Key technique of deep neural network and its applications in autonomous driving[J]. Journal of Automotive Safety and Energy, 2019, 10(2): 119. | |
2 | HANCOCK P A, NOURBAKHSH I, STEWART J. On the future of transportation in an era of automated and autonomous vehicles[J]. Proceedings of the National Academy of Sciences, 2019, 116(16): 7684-7691. |
3 | KALRA N, PADDOCK S M. Driving to safety: how many miles of driving would it take to demonstrate autonomous vehicle reliability[J]. Transportation Research Part A: Policy and Practice, 2016, 94: 182-193. |
4 | 李克强, 戴一凡, 李升波, 等. 智能网联汽车 (ICV) 技术的发展现状及趋势[J]. 汽车安全与节能学报, 2017, 8(1): 1-14. |
LI K Q, DAI Y F, LI S E, et al. State-of-the-art and technical trends of intelligent and connected vehicles[J]. Journal of Automotive Safety and Energy, 2017, 8(1): 1-14. | |
5 | 丁飞, 张楠, 李升波, 等. 智能网联车路云协同系统架构与关键技术研究综述[J]. 自动化学报, 2022, 48: 1-24. |
DING F, ZHANG N, LI S E, et al. A survey of architecture and key technologies of intelligent connected vehicle-road-cloud cooperation system[J]. Acta Automatica Sinica, 2022, 48: 1-24. | |
6 | URMSON C, BAKER C, DOLAN J, et al. Autonomous driving in traffic: boss and the urban challenge[J]. AI Magazine, 2009, 30(2): 17-17. |
7 | MONTEMERLO M, BECKER J, BHAT S, et al. Junior: the stanford entry in the urban challenge[J]. Journal of Field Robotics, 2008, 25(9): 569-597. |
8 | BOJARSKI M, DEL TESTA D, DWORAKOWSKI D, et al. End to end learning for self-driving cars[J]. arXiv Preprint arXiv:, 2016. |
9 | VALLON C, ERCAN Z, CARVALHO A, et al. A machine learning approach for personalized autonomous lane change initiation and control[C]. Intelligent Vehicles Symposium (IV). IEEE, 2017: 1590-1595. |
10 | RESCORLA R A. Behavioral studies of Pavlovian conditioning[J]. Annual Review of Neuroscience, 1988, 11(1): 329-352. |
11 | THORNDIKE E L. Animal intelligence: experimental studies[M]. Transaction Publishers, 1911. |
12 | SCHULTZ W, DAYAN P, MONTAGUE P R. A neural substrate of prediction and reward[J]. Science, 1997, 275(5306): 1593-1599. |
13 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]. International Conference on Learning Representations (ICLR). 2016. |
14 | GUAN Y, REN Y, SUN Q, et al. Integrated decision and control: toward interpretable and efficient driving intelligence[J]. IEEE Transactions on Cybernetics, 2022, 53(2): 859-873. |
15 | GUAN Y, TANG L, LI C, et al. Integrated decision and control for high-level automated vehicles by mixed policy gradient and its experiment verification[J]. arXiv Preprint arXiv:, 2022. |
16 | JIANG J, REN Y, GUAN Y, et al. Integrated decision and control at multi-lane intersections with mixed traffic flow[J]. Journal of Physics, 2022, 2234(1): 012015. |
17 | CAI P, SUN Y, CHEN Y, et al. Vision-based trajectory planning via imitation learning for autonomous vehicles[C]. International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2019: 2736-2742. |
18 | HOEL C J, WOLFF K, LAINE L. Automated speed and lane change decision making using deep reinforcement learning[C]. International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018: 2148-2155. |
19 | YURTSEVER E, CAPITO L, REDMILL K, et al. Integrating deep reinforcement learning with model-based path planners for automated driving[C]. Intelligent Vehicles Symposium (IV). IEEE, 2020: 1311-1316. |
20 | DUAN J, LI S E, GUAN Y, et al. Hierarchical reinforcement learning for self‐driving decision‐making without reliance on labelled driving data[J]. IET Intelligent Transport Systems, 2020, 14(5): 297-305. |
21 | LIU Z, DUAN J, WANG W, et al. Recurrent model predictive control: learning an explicit recurrent controller for nonlinear systems[J]. IEEE Transactions on Industrial Electronics, 2022, 69(10): 10437-10446. |
22 | LIN Z, DUAN J, LI S E, et al. Policy-iteration-based finite-horizon approximate dynamic programming for continuous-time nonlinear optimal control[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022. |
23 | REN Y, JIANG J, ZHAN G, et al. Self-learned intelligence for integrated decision and control of automated vehicles at signalized intersections [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 24145-24156. |
24 | GU Z, YIN Y, LI S E, et al. Integrated eco-driving automation of intelligent vehicles in multi-lane scenario via model-accelerated reinforcement learning [J]. Transportation Research Part C: Emerging Technologies, 2022, 144: 103863. |
25 | GUAN Y, REN Y, MA H, et al. Learn collision-free self-driving skills at urban intersections with model-based reinforcement learning[C]. International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2021: 3462-3469. |
26 | CHEN D, ZHOU B, KOLTUN V, et al. Learning by cheating[C]. Conference on Robot Learning (CoRL). 2020: 66-75. |
27 | CHEN J, YUAN B, TOMIZUKA M. Model-free deep reinforcement learning for urban autonomous driving[C]. International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2019: 2765-2771. |
28 | PENG B, SUN Q, LI S E, et al. End-to-End autonomous driving through dueling double deep Q-network [J]. Automotive Innovation, 2021, 4(3): 328-337. |
29 | LI Q, PENG Z, FENG L, et al. Metadrive: composing diverse driving scenarios for generalizable reinforcement learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022. |
30 | DUAN J, ZHANG F, LI S E, et al. Applications of distributional soft actor-critic in real-world autonomous driving[C]. International Conference on Computer, Control and Robotics (ICCCR). IEEE, 2022: 109-114. |
31 | CHEN J, LI S E, TOMIZUKA M. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(6): 5068-5078. |
32 | LESORT T, DÍAZ-RODRÍGUEZ N, GOUDOU J F, et al. State representation learning for control: an overview[J]. Neural Networks, 2018, 108: 379-392. |
33 | DE BRUIN T, KOBER J, TUYLS K, et al. Integrating state representation learning into deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2018, 3(3): 1394-1401. |
34 | DUAN J, YU D, LI S E, et al. Fixed-dimensional and permutation invariant state representation of autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(7): 9518-9528. |
35 | ISELE D, RAHIMI R, COSGUN A, et al. Navigating occluded intersections with autonomous vehicles using deep reinforcement learning[C]. International Conference on Robotics and Automation (ICRA). IEEE, 2018: 20342039. |
36 | GE Q, SUN Q, LI S E, et al. Numerically stable dynamic bicycle model for discrete-time control[C]. Intelligent Vehicles Symposium. IEEE, 2021: 128-134. |
37 | LI G, YANG Y, LI S E, et al. Decision making of autonomous vehicles in lane change scenarios: deep reinforcement learning approaches with risk awareness[J]. Transportation Research Part C: Emerging Technologies, 2022, 134: 103452. |
38 | REN Y, DUAN J, LI S E, et al. Improving generalization of reinforcement learning with minimax distributional soft actor-critic[C]. International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020: 1-6. |
39 | LIN Z, DUAN J, LI S E, et al. Continuous-time finite-horizon ADP for automated vehicle controller design with high efficiency[C]. International Conference on Unmanned Systems (ICUS). IEEE, 2020: 978-984. |
40 | XIN L, KONG Y, LI S E, et al. Enable faster and smoother spatio-temporal trajectory planning for autonomous vehicles in constrained dynamic environment[J]. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, 2021, 235(4): 1101-1112. |
41 | YU D, MA H, LI S E, et al. Reachability constrained reinforcement learning[C]. International Conference on Machine Learning (ICML). PMLR, 2022: 25636-25655. |
42 | LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. |
43 | QI C R, SU H, MO K, et al. Pointnet: deep learning on point sets for 3D classification and segmentation[C]. Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 652-660. |
44 | YU Y, SI X, HU C, et al. A review of recurrent neural networks: LSTM cells and network architectures[J]. Neural Computation, 2019, 31(7): 1235-1270. |
45 | CRESWELL A, WHITE T, DUMOULIN V, et al. Generative adversarial networks: an overview[J]. IEEE Signal Processing Magazine, 2018, 35(1): 53-65. |
46 | WANG Y, YAO H, ZHAO S. Auto-encoder based dimensionality reduction[J]. Neurocomputing, 2016, 184: 232-242. |
47 | KINGMA D P, WELLING M. An introduction to variational autoencoders[J]. Foundations and Trends in Machine Learning, 2019, 12(4): 307-392. |
48 | MU Y M, CHEN S, DING M, et al. CtrlFormer: learning transferable state representation for visual control via transformer[C]. International Conference on Machine Learning (ICML), 2022: 16043-16061. |
49 | ZAHEER M, KOTTUR S, RAVANBAKHSH S, et al. Deep sets[C]. Advances in Neural Information Processing Systems (NIPS), 2017, 30. |
50 | MARON H, LITANY O, CHECHIK G, et al. On learning sets of symmetric elements[C]. International Conference on Machine Learning (ICML), 2020: 6734-6744. |
51 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. Advances in Neural Information Processing Systems (NIPS), 2017, 30. |
52 | FENG S, SUN H, YAN X, et al. Dense reinforcement learning for safety validation of autonomous vehicles[J]. Nature, 2023,615(7953): 620-627. |
53 | SOBHANI A, YOUNG W, BAHROLOLOOM S, et al. Calculating time-to-collision for analysing right turning behaviour at signalised intersections[J]. Road & Transport Research: A Journal of Australian and New Zealand Research and Practice, 2013, 22(3): 49-61. |
54 | KOLEKAR S, DE WINTER J, ABBINK D. Human-like driving behaviour emerges from a risk-based driver model[J]. Nature Communications, 2020, 11(1): 1-13. |
55 | CHEN C, LAN Z, ZHAN G, et al. Podar: modeling driver's perceived risk with situation awareness theory[J]. Available at SSRN 4129030. |
56 | LI S E, LI K, WANG J. Economy-oriented vehicle adaptive cruise control with coordinating multiple objectives function[J]. Vehicle System Dynamics, 2013, 51(1): 1-17. |
57 | LI S E. Reinforcement learning for sequential decision and optimal control[M]. Springe, 2023. |
58 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. |
59 | FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]. International Conference on Machine Learning (ICML), 2018: 1587-1596. |
60 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv Preprint arXiv:, 2017. |
61 | HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]. International Conference on Machine Learning (ICML), 2018: 1861-1870. |
62 | DUAN J, GUAN Y, LI S E, et al. Distributional soft actor-critic: off-policy reinforcement learning for addressing value estimation errors[J]. IEEE Transactions on Neural Networks And Learning Systems, 2021, 33(11): 6584-6598. |
63 | GUAN Y, DUAN J, LI S E, et al. Mixed policy gradient[J]. arXiv Preprint arXiv:, 2021. |
64 | MU Y, PENG B, GU Z, et al. Mixed reinforcement learning for efficient policy optimization in stochastic environments[C]. International Conference on Control, Automation and Systems (ICCAS). IEEE, 2020: 1212-1219. |
65 | GUAN Y, LI S E, DUAN J, et al. Direct and indirect reinforcement learning[J]. International Journal of Intelligent Systems, 2021, 36(8): 4439-4467. |
[1] | 胡林,谷子逸,王丹琦,王方,邹铁方,黄晶. 汽车安全性测评规程现状及趋势展望[J]. 汽车工程, 2024, 46(2): 187-200. |
[2] | 关书睿,李克强,周俊宇,石佳,孔伟伟,罗禹贡. 面向强制换道场景的智能网联汽车协同换道策略[J]. 汽车工程, 2024, 46(2): 201-210. |
[3] | 王庞伟,刘程,汪云峰,张名芳. 面向城市道路的智能网联汽车多车道轨迹优化方法[J]. 汽车工程, 2024, 46(2): 241-252. |
[4] | 左政,王云鹏,麻斌,邹博松,曹耀光,杨世春. 基于AFC-TARA的车载网络组件风险率量化评估分析[J]. 汽车工程, 2023, 45(9): 1553-1562. |
[5] | 刘济铮,王震坡,孙逢春,张雷. 异构智能网联汽车编队延迟补偿控制研究[J]. 汽车工程, 2023, 45(9): 1573-1582. |
[6] | 吴思宇,于文浩,邢星宇,张玉新,李楚照,李雪轲,古昕昱,李云巍,马小涵,路伟,王政,郝圳茂,王红,李骏. 基于关键场景的预期功能安全双闭环测试验证方法[J]. 汽车工程, 2023, 45(9): 1583-1607. |
[7] | 边有钢,张田田,谢和平,秦洪懋,杨泽宇. 车辆队列抗扰抗内切协同路径跟踪控制[J]. 汽车工程, 2023, 45(8): 1320-1332. |
[8] | 朱冰,姜泓屹,赵健,韩嘉懿,刘彦辰. 智能网联汽车协同感知信任度动态计算与评价方法[J]. 汽车工程, 2023, 45(8): 1383-1391. |
[9] | 关宇昕,冀浩杰,崔哲,李贺,陈丽文. 智能网联汽车车载CAN网络入侵检测方法综述[J]. 汽车工程, 2023, 45(6): 922-935. |
[10] | 胡耘浩,李克强,向云丰,石佳,罗禹贡. 智能网联汽车通用跨平台实时仿真系统架构及应用[J]. 汽车工程, 2023, 45(3): 372-381. |
[11] | 刘浩天,魏洪乾,时培成,张幽彤. 基于帧间隔-总线电压混合特征的汽车ECU伪装攻击识别[J]. 汽车工程, 2023, 45(11): 2070-2081. |
[12] | 李捷,吴晓东,许敏,刘永刚. 基于强化学习的城市场景多目标生态驾驶策略[J]. 汽车工程, 2023, 45(10): 1791-1802. |
[13] | 钱立军,陈晨,陈健,陈欣宇,熊驰. 基于Q学习模型的无信号交叉口离散车队控制[J]. 汽车工程, 2022, 44(9): 1350-1358. |
[14] | 钟文沁,孔伟伟,李志恒,于杰,罗禹贡. 不同渗透率下非信控交叉路口混合预约多车协同控制[J]. 汽车工程, 2022, 44(8): 1144-1152. |
[15] | 陈一鹤,孔伟伟,于杰,李克强,罗禹贡. 混合交通下非信控交叉口队列预约式控制[J]. 汽车工程, 2022, 44(7): 953-959. |
|