基于MADDPG的多无人车协同事件触发通信

doi:10.12305/j.issn.1001-506X.2024.07.35

摘要/Abstract

摘要：

针对典型的端到端通信策略不能决定通信间隔时间, 只能在固定频率下通信的问题, 提出一种基于深度强化学习方法的事件触发变频率通信策略, 以解决多无人车协同最小通信问题。首先建立事件触发架构, 主要包含计算通信的控制器, 并给出触发条件, 保证满足条件时多无人车间进行通信, 大幅度减少通信总量。其次, 基于多智能体深度确定性策略梯度(multiple agent deep deterministic policy gradient, MADDPG)算法对触发机制进行优化, 提高算法收敛速度。仿真和实车实验表明, 随着迭代次数的增加, 在完成协同任务的前提下, 多无人车系统中通信数据量降低了55.74%, 验证了所提出策略的有效性。

关键词: 事件触发通信, 深度强化学习, 协同围捕, 多无人车

Abstract:

In response to the problem of typical end-to-end communication strategies that cannot determine the communication interval and can only communicate at fixed frequencies, an event-triggered communication strategy is proposed based on deep reinforcement learning to solve the minimal communication problem in multi-unmanned ground vehicles collaboration. Firstly, an event-triggered architecture is established, which mainly includes a communication controller and provides trigger conditions. This ensures that communication occurs among multiple unmanned ground vehicle only when the conditions are met, significantly reducing the overall commu-nication volume. Secondly, the trigger mechanism is optimized using the multiple agent deep deterministic policy gradient (MADDPG) algorithm, which improves the convergence speed of the algorithm. Simulation and real vehicle experiments show that with increasing iterations, the amount of communication data in the multiple unmanned ground vehicle system is reduced by 55.74% while still accomplishing the collaborative tasks, thus validating the effecti-veness of the proposed strategy.

Key words: event-triggered communication, deep reinforcement learning, collaborative pursuit, multiple unmanned ground vehicles

中图分类号:

T249

郭宏达, 娄静涛, 徐友春, 叶鹏, 李永乐, 陈晋生. 基于MADDPG的多无人车协同事件触发通信[J]. 系统工程与电子技术, 2024, 46(7): 2525-2533.

Hongda GUO, Jingtao LOU, Youchun XU, Peng YE, Yongle LI, Jinsheng CHEN. Event-triggered communication of multiple unmanned ground vehicles collaborative based on MADDPG[J]. Systems Engineering and Electronics, 2024, 46(7): 2525-2533.

图/表 12

图1

图2

表1

图3

图4

图5

表2

图6

表3

图7

图8

表4

参考文献 14

14	ZHOU T , LIU Q L , WANG D , et al. Leader-following consensus for linear multi-agent systems based on integral-type event-triggered strategy[J]. Control and Decision, 2022, 37 (5): 1258- 1266.
15	王浩亮, 柴亚星, 王丹, 等. 基于事件触发机制的多自主水下航行器协同路径跟踪控制[J]. 自动化学报, 2022, 45 (2): 1001- 1011.
	WANG H L , CHAI Y X , WANG D , et al. Event-triggered cooperative path following of multiple autonomous underwater vehicles[J]. Acta Automatica Sinica, 2022, 45 (2): 1001- 1011.
16	陈世明, 邵赛, 姜根兰. 基于事件触发二阶多智能体系统的固定时间比例一致性[J]. 自动化学报, 2022, 48 (1): 261- 270.
	CHEN S M , SHANG S , JIANG G L . Distributed event-triggered fixed-time scaled consensus control for second-order multi-agent systems[J]. Acta Automatica Sinica, 2022, 48 (1): 261- 270.
17	PENG C , LI F Q . A survey on recent advances in event-triggered communication and control[J]. Information Sciences, 2018, 457 (8): 113- 125.
18	HUTTENRAUCH M , SOSIC A , NEUMANN G . Deep reinforcement learning for swarm systems[J]. Journal of Machine Learning Research, 2019, 20 (54): 1- 31.
19	WANG Z F , GAO Y B , LIU Y F , et al. Distributed dynamic event-triggered communication and control for multi-agent consensus: a hybrid system approach[J]. Information Sciences, 2022, 618 (12): 191- 208.
20	RYU H C, SHIN H Y, PARK J K. Multi-agent actor-critic with hierarchical graph attention network[C]//Proc.of the 34th AAAI Conferenceon Artificial Intelligence, 2020: 7236-7243.
21	ZHU X D , ZHANG F , LI H . Swarm deep reinforcement learning for robotic manipulation[J]. Procedia Computer Science, 2022, 198 (12): 472- 479.
22	LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proc.of the 31st International Conference on Neural Information Processing Systems, 2017: 6382-6393.
1	张梦钰, 豆亚杰, 陈子夷, 等. 深度强化学习及其在军事领域中的应用综述[J]. 系统工程与电子技术, 2024, 46 (4): 1297- 1308. doi: 10.12305/j.issn.1001-506X.2024.04.18
	ZHANG M Y , DOU Y J , CHEN Z Y , et al. Deep reinforcement learning and its applications in military field[J]. Systems Engineering and Electronics, 2024, 46 (4): 1297- 1308. doi: 10.12305/j.issn.1001-506X.2024.04.18
2	费博雯, 包卫东, 刘大千, 等. 面向动态目标搜索与打击的空地协同自主任务分配方法[EB/OL].[2023-05-11].http://kns.cnki.net/kcms/detail/11.2422.TN.20221228.1702.020.html.
	FEI B W, BAO W D, LIU D Q, et al. Air-ground cooperative autonomous task allocation method for dynamic target search and strike[EB/OL].[2023-05-11]. http://kns.cnki.net/kcms/detail/11.2422.TN.20221228.1702.020.html.
3	ZHANG Z, WANG X H, ZHANG Q R, et al. Multi-robot cooperative pursuit via potential field-enhanced reinforcement learning[C]//Proc.of the International Conference on Robotics and Automation, 2022: 8808-8814.
4	OLSEN T, STIFFLER N M, O’KANE J M. Rapid recovery from robot failures in multi-robot visibility-based pursuit-evasion[C]//Proc.of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021: 9734-9741.
5	BAUMANN D, ZHU J J, MARTIUS G, et al. Deep reinforcement learning for event-triggered control[C]//Proc.of the IEEE Conference on Decision and Control, 2018: 943-950.
6	HU G Z , ZHU Y H , ZHAO D B , et al. Event-triggered communication network with limited-bandwidth constraint for multi-agent reinforcement learning[J]. IEEE Trans. on Neural Networks and Learning Systems, 2021, 34 (8): 3966- 3978.
7	OTTE M , KUHLMAN M , SOFGE D . Competitive target search with multi-agent teams: symmetric and asymmetric communication constraints[J]. Autonomous Robots, 2018, 42 (6): 1207- 1230. doi: 10.1007/s10514-017-9687-0
8	DENG C , WEN C Y , WANG W , et al. Distributed adaptive tracking control for high-order nonlinea multiagent systems over event-triggered communication[J]. IEEE Trans. on Automatic Control, 2023, 68 (2): 1176- 1183. doi: 10.1109/TAC.2022.3148384
9	WANG Z J , YANG G , SU X S , et al. Ouijabots: omnidirectional robots for cooperative object transport with rotation control using no communication[J]. Distributed Autonomous Robotic Systems, 2018, 6, 117- 131.
23	FUNK N , BAUMANN D , BERENZ V , et al. Learning event-triggered control from data through joint optimization[J]. IFAC Journal of Systems and Control, 2021, 16 (6): 100144- 100161.
24	FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[C]//Proc.of the AAAI Conference on Artificial Intelligence, 2018: 2974-2982.
25	MIYAZAKI K, MATSUNAGA N, MURATA K, et al. Formation path learning for cooperative transportation of multiple robots using[C]//Proc.of the 21st International Conference on Control, Automation and Systems, 2021: 1619-1623.
26	GONZÁLEZ-SIERRA J , FLORES-MONTES D , HERNANDEZ-MARTINEZ E G , et al. Robust circumnavigation of a heterogeneous multi-agent system[J]. Autonomous Robots, 2021, 45 (2): 265- 281.
27	CHEN Z Y , NIU B , ZHANG L , et al. Command filtering-based adaptive neural network control for uncertain switched nonlinear systems using event-triggered communication[J]. International Journal Robust Nonlinear Control, 2022, 32 (11): 6507- 6522. doi: 10.1002/rnc.6154
28	MEISTER D , DVRR F , ALLGOWER F . Shared network effects in time-versus event-triggered consensus of a single-integrator multi-agent system[J]. IFAC-Papers Online, 2023, 56 (2): 5975- 5980.
29	HUA M, ZHANG C F, LI Z, et al. Multi-agent deep reinforcement learning for charge-sustaining control of multi-mode hybrid vehicles[EB/OL].[2023-05-11]. https://arxiv.org/abs/2209.02633.
30	OLFATI-SABER R , FAX J A , MURRAY R M . Consensus and cooperation in networked multi-agent systems[J]. Proceedings of the IEEE, 2007, 95 (1): 215- 233.
10	邓甲, 王付永, 刘忠信, 等. 动态事件触发机制下二阶多智能体系统完全分布式控制[J]. 控制理论与应用, 2023, 41 (1): 11- 20.
	DENG J , WANG F Y , LIU Z X , et al. Fully distributed control for second-order multi-agent systems under dynamic event-triggered mechanism[J]. Control Theory & Applications, 2023, 41 (1): 11- 20.
11	黄兵, 肖云飞, 冯元, 等. 无人艇全分布式动态事件触发编队控制[J]. 控制理论与应用, 2023, 40 (8): 1479- 1487.
	HUANG B , XIAO Y F , FENG Y , et al. Fully distributed dyna-mic event-triggered formation control for multiple unmanned surface vehicles[J]. Control Theory & Applications, 2023, 40 (8): 1479- 1487.
12	ZUO R W , LI Y H , LYU M . Learning-based distributed containment control for hfv swarms under event-triggered communication[J]. IEEE Trans. on Aerospace and Electronic Systems, 2023, 59 (1): 568- 579. doi: 10.1109/TAES.2022.3185969
13	HIRCHE S . Distributed control for cooperative manipulation with event-triggered communication[J]. IEEE Trans. on Robotics, 2020, 36 (4): 1038- 1052. doi: 10.1109/TRO.2020.2973096
14	周托, 刘全利, 王东, 等. 积分事件触发策略下的线性多智能体系统领导跟随一致性[J]. 控制与决策, 2022, 37 (5): 1258- 1266.

变量	值
控制周期/s	0.1
通信周期/s	0.02
每回合的步数	100
回合的数量	10⁶
隐藏层的数量	4
每层的单位数	64
隐藏层的激活函数	ReLU
critic网络输出层的激活函数	Linear
actor网络输出层的激活函数	Tanh
折扣系数	0.99
批量大小	4 096
回放缓冲区	1.0×10⁶

参数	取值
控制周期/s	0.25
通信周期/s	0.05
每回合的步数	150
训练迭代次数	10⁶
批量大小	4 096
回放缓冲区	10⁶

通信策略	成功率
高固定频率	0.91
低固定频率	0.77
无通信	0.35
基于事件触发通信	0.89

通信策略	平均耗时/s	总路径长度/m	累计发出数据量/m
固定频率通信	55.36	166.91	2 049.82
事件触发通信	57.22	180.17	907.33

[1]	张梦钰, 豆亚杰, 陈子夷, 姜江, 杨克巍, 葛冰峰. 深度强化学习及其在军事领域中的应用综述[J]. 系统工程与电子技术, 2024, 46(4): 1297-1308.
[2]	李彦铃, 罗飞舟, 葛致磊. 基于鲁棒观测器的深度强化学习垂直起降运载器姿态稳定研究[J]. 系统工程与电子技术, 2024, 46(3): 1038-1047.
[3]	吴冯国, 陶伟, 李辉, 张建伟, 郑成辰. 基于深度强化学习算法的无人机智能规避决策[J]. 系统工程与电子技术, 2023, 45(6): 1702-1711.
[4]	唐进, 梁彦刚, 白志会, 黎克波. 基于DQN的旋翼无人机着陆控制算法[J]. 系统工程与电子技术, 2023, 45(5): 1451-1460.
[5]	唐斯琪, 潘志松, 胡谷雨, 吴炀, 李云波. 深度强化学习在天基信息网络中的应用——现状与前景[J]. 系统工程与电子技术, 2023, 45(3): 886-901.
[6]	李信, 李勇军, 赵尚弘. 基于深度强化学习的卫星光网络波长路由算法[J]. 系统工程与电子技术, 2023, 45(1): 264-270.
[7]	王冠, 茹海忠, 张大力, 马广程, 夏红伟. 弹性高超声速飞行器智能控制系统设计[J]. 系统工程与电子技术, 2022, 44(7): 2276-2285.
[8]	孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318.
[9]	杨清清, 高盈盈, 郭玙, 夏博远, 杨克巍. 基于深度强化学习的海战场目标搜寻路径规划[J]. 系统工程与电子技术, 2022, 44(11): 3486-3495.
[10]	高昂, 董志明, 李亮, 宋敬华, 段莉. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420-433.
[11]	马文, 李辉, 王壮, 黄志勇, 吴昭欣, 陈希亮. 基于深度随机博弈的近距空战机动决策[J]. 系统工程与电子技术, 2021, 43(2): 443-451.
[12]	高昂, 郭齐胜, 董志明, 杨绍卿. 基于EAS+MADRL的多无人车体系效能评估方法研究[J]. 系统工程与电子技术, 2021, 43(12): 3643-3651.
[13]	张堃, 李珂, 时昊天, 张振冲, 刘泽坤. 基于深度强化学习的UAV航路自主引导机动控制决策算法[J]. 系统工程与电子技术, 2020, 42(7): 1567-1574.
[14]	谢浩, 郭爱煌, 宋春林, 焦润泽. LTE-V下基于深度强化学习的基站选择算法[J]. 系统工程与电子技术, 2019, 41(7): 1652-1657.
[15]	李晨溪, 曹雷, 张永亮, 陈希亮, 周宇欢, 段理文. 基于知识的深度强化学习研究综述[J]. 系统工程与电子技术, 2017, 39(11): 2603-2613.