Systems Engineering and Electronics ›› 2023, Vol. 45 ›› Issue (10): 3165-3171.doi: 10.12305/j.issn.1001-506X.2023.10.21
• Systems Engineering • Previous Articles
Zhiruo ZHAO, Lei CAO, Xiliang CHEN, Jun LAI, Legui ZHANG
Received:
2021-10-25
Online:
2023-09-25
Published:
2023-10-11
Contact:
Lei CAO
CLC Number:
Zhiruo ZHAO, Lei CAO, Xiliang CHEN, Jun LAI, Legui ZHANG. UAV intelligent attack strategy generation model based on multi-agent game reinforcement learning[J]. Systems Engineering and Electronics, 2023, 45(10): 3165-3171.
Table 1
Initial positions and parameters of mounted weapons of red and blue sides"
挂载武器 | 红蓝双方 | 初始位置 |
AGM-114K型 地狱火Ⅱ 反坦克导弹 | 无人机1 | 航向: 230° 航速: 129 km/h 高度: 777 m 东经: 43°56′37″ 北纬: 33°52′11″ |
无人机2 | 航向: 91° 航速: 129 km/h 高度: 610 m 东经: 43°58′06″ 北纬: 32°54′23″ | |
无人机3 | 航向: 230° 航速: 248 km/h 高度: 777 m 东经: 45°09′16″ 北纬: 33°48′30″ | |
- | 坦克排1 | 东经: 44°09′09″ 北纬: 33°45′09″ |
坦克排2 | 东经: 44°07′44″ 北纬: 33°16′38″ | |
坦克排3 | 东经: 44°19′35″ 北纬: 33°24′24″ | |
坦克排4 | 东经: 44°35′10″ 北纬: 33°38’28″ | |
坦克排5 | 东经: 44°35′04″ 北纬: 33°24′24″ | |
坦克排6 | 东经: 44°35′20″ 北纬: 33°11′37″ | |
坦克排7 | 东经: 44°49′43″ 北纬: 33°24′35″ | |
坦克排8 | 东经: 45°01′42″ 北纬: 33°31′36″ | |
坦克排9 | 东经: 44°59′43″ 北纬: 33°04′46″ | |
萨姆22“灰狗” 地空导弹 | 地空 导弹排1 | 东经: 44°20′07″ 北纬: 33°36′55″ |
地空 导弹排2 | 东经: 44°20′45″ 北纬: 33°12′18″ | |
地空 导弹排3 | 东经: 44°49′51″ 北纬: 33°36′28″ | |
地空 导弹排4 | 东经: 44°50′46″ 北纬: 33°12′28″ |
1 | 孙彧, 李清伟, 徐志雄,等.基于多智能体深度强化学习的空战博弈对抗策略训练模型[J].指挥信息系统与技术,2021,12(2):16-20. |
SUNY , LIQ W , XUZ X ,et al.Game confrontation strategy training model for air combat based on multi agent deep reinforcement learning[J].Command Information System and Technology,2021,12(2):16-20. | |
2 | 陈希亮, 曹雷, 沈驰.基于深度逆向强化学习的行动序列规划问题研究[J].国防科技,2019,40(4):55-61. |
CHENX L , CAOL , SHENC .Research on action sequence planning based on deep inverse reinforcement learning[J].National Defense Science & Technology,2019,40(4):55-61. | |
3 | 曹雷, 孙彧, 陈希亮,等.联合作战任务智能规划关键技术及其应用思考[J].国防科技,2020,41(3):49-56. |
CAOL , SUNY , CHENX L ,et al.Key technology and application of intelligent mission planning in joint operations[J].National Defense Science & Technology,2020,41(3):49-56. | |
4 | 曹雷, 陈希亮, 汤伟.智能化陆军建设[J].国防科技,2019,40(4):14-19. |
CAOL , CHENX L , TANGW .Intelligent army construction[J].National Defense Science & Technology,2019,40(4):14-19. | |
5 | 陈希亮, 李清伟, 孙彧.基于博弈对抗的空战智能决策关键技术[J].指挥信息系统与技术,2021,12(2):6. |
CHENX L , LIQ W , SUNY .Key technologies for air combat intelligent decision based on game confrontation[J].Command Information System and Technology,2021,12(2):6. | |
6 | SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning[C]// Proc. of the 17th International Conference on Autonomous Agents and Multiagent Systems, 2018: 10-15. |
7 | RASHID T, SAMVELYAN M, WITT C D, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning[C]//Proc. of the 35th International Confe-rence on Machine Learning, 2018: 4295-4304. |
8 | YANG Y, RUI L, LI M, et al. Mean field multi-agent reinforcement learning[C]//Proc. of the 35th International Conference on Machine Learning, 2018: 5571-5580. |
9 | FOERSTER J N, CHEN R Y, AL-SHEDIVAT M, et al. Learning with opponent-learning awareness[C]//Proc. of the 17th International Conference on Autonomous Agents and Multi Agent Systems, 2017: 122-130. |
10 | PENG P, WEN Y, YANG Y, et al. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games[EB/OL]. [2021-10-10]. https://arxiv.org/pdf/1703.10069.pdf. |
11 | HU D P, JIANG X S, WEI X M, et al. State representation learning for minimax deep deterministic policy gradient[C]//Proc. of the 12th International Conference on Knowledge Science, Engineering and Management, 2019: 481-487. |
12 | YANG Y D, RUI L, LI M N, et al. Mean field multi-agent reinforcement learning[C]//Proc. of the 35th International Conference on Machine Learning, 2018. |
13 | FOERSTER J N, CHEN R Y, AL S M, et al. Learning with opponent-learning awareness[C]//Proc. of the 17th International Conference on Autonomous Agents and Multi Agent Systems, 2017: 122-130. |
14 | LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1706.02275. |
15 | HERNANDEZ-LEAL P, KAISERS M, BAARSLAG T, et al. A survey of learning in multiagent environments: dealing with non-stationarity[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1707.09183. |
16 | WATSONJ .Strategy: an introduction to game theory[M].New York:W. W. Norton & Company,2013. |
17 | SUTTONR , BARTOA .Reinforcement learning: an introduction[M].Cambridge:MIT Press,1998. |
18 | LECUNY , BENGIOY , HINTONG .Deep learning[J].Nature,2015,521(7553):436-444. |
19 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//Proc. of the 4th International Conference on Learning Representations, 2016. |
20 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1707.06347. |
21 | RASMUSENE .Games and information: an introduction to game theory[J].International Journal of Industrial Organization,1991,9(3):474-476. |
22 | LEE K, RENGARAJAN D, KALATHIL D, et al. Learning trembling hand perfect mean field equilibrium for dynamic mean field games[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2006.11683. |
23 | SHAPLEY L S. Stochastic games[J]//Proceedings of the National Academy of Sciences, 1953, 39(10): 1095-1100. |
24 | BAŞART .Dynamic noncooperative game theory[J].Society for Industrial and Applied Mathematics,1982,19(2):139-152. |
25 | 郝峰, 张栋, 唐硕,等.基于改进RRT算法的巡飞弹快速航迹规划方法[J].飞行力学,2019,37(3):58-63. |
HAOF , ZHANGD , TANGS ,et al.A rapid route planning method of loitering munitions based on improved RRT algorithm[J].Flight Mechanics,2019,37(3):58-63. | |
26 | YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of MAPPO in cooperative, multi-agent games[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2103.01955. |
27 | BOOTH J. PPO dash: improving generalization in deep reinforcement learning[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1907.06704. |
28 | ENGSTROM L, ILYAS A, SANTURKAR S, et al. Implementation matters in deep policy gradients: a case study on PPO and TRPO[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2005.12729. |
29 | PEDREGOSAF , VAROQUAUXG , GRAMFORTA ,et al.Scikit-learn: machine learning in Python[J].The Journal of Machine Learning Research,2011,12(4):2825-2830. |
30 | ABADI M, BARHAM P, CHEN J, et al. Tensorflow: a system for large-scale machine learning[C]//Proc. of the 12th Unix Users'Group Symposium on Operating Systems Design and Implementation, 2016: 265-283. |
[1] | Zhongbao WANG, Kuiying YIN. Block effect suppression method of UAV-borne SAR image based on joint domain filtering [J]. Systems Engineering and Electronics, 2023, 45(9): 2768-2776. |
[2] | Honghai ZHANG, Zhenping REN, Ouge FENG, Fei WANG, Hao LIU. Logistics unmanned aerial vehicle flight plan pre-allocation in urban low-altitude airspace [J]. Systems Engineering and Electronics, 2023, 45(9): 2802-2811. |
[3] | Yufeng LIANG, Jingchao ZHAO, Wangkui LIU, Lei WANG, Shipeng WANG, Shilong RUAN. Air combat guidance method based on top rolling optimization and bottom tracking [J]. Systems Engineering and Electronics, 2023, 45(9): 2866-2872. |
[4] | Huizhu HAN, Yangchao HUANG, Hang HU, Qi AN, Shihao LIU. Energy-spectrum efficiency trade-off optimization based on short packet transmission in UAV communication [J]. Systems Engineering and Electronics, 2023, 45(9): 2956-2964. |
[5] | Xiaogang QI, Yutong ZHOU, Lifang LIU. Evaluation of the reliability of UAV swarm for ground combat missions [J]. Systems Engineering and Electronics, 2023, 45(9): 2971-2978. |
[6] | Lei SHENG, Manhong SHI, Yingchuan QI, Hao LI, Mingjun PANG. Dynamic offense and defense of UAV swarm based on situation evolution game [J]. Systems Engineering and Electronics, 2023, 45(8): 2332-2342. |
[7] | Tong XU, Yazhou CHEN, Yuming WANG, Min ZHAO. Research on wideband white noise electromagnetic interference effect of UAV data link [J]. Systems Engineering and Electronics, 2023, 45(7): 1965-1973. |
[8] | Xiaocao YANG, Yanli DU, Yunong BU, Yanbin LIU, Cheng GAO. Online three-dimensional RRT* cooperative route planning based on hierarchical decomposition [J]. Systems Engineering and Electronics, 2023, 45(5): 1409-1419. |
[9] | Yang PANG, Ming WANG, Ziyi YAN, Tongyao YUE, Zhe ZHOU. UAV localization method with multi-view fusion [J]. Systems Engineering and Electronics, 2023, 45(4): 1127-1133. |
[10] | Haigang SUI, Jiajie LI, Guohua GOU. Online fast localization method of UAVs based on heterologous image matching [J]. Systems Engineering and Electronics, 2023, 45(10): 3008-3015. |
[11] | Honghai ZHANG, Bowen LI, Hao LIU, Gang ZHONG, Yuhan FEI. Demarcation method of safety separation for multi-rotor UAV in free airspace [J]. Systems Engineering and Electronics, 2023, 45(10): 3149-3156. |
[12] | Kun FANG, Xiaohui LI, Tao FAN. High-precision positioning algorithm for UAV based on random forest weight compensation [J]. Systems Engineering and Electronics, 2023, 45(1): 202-209. |
[13] | Tianye SUN, Wei SUN, Jianjun WU. UAV formation rapid assembly method based on improved Quatre algorithm [J]. Systems Engineering and Electronics, 2022, 44(9): 2840-2848. |
[14] | Jing YU, Enmi YONG, Hanyang CHEN, Dong HAO, Xiancai ZHANG. Bi-level mission planning method for multi-cooperative UAV air-to-ground attack [J]. Systems Engineering and Electronics, 2022, 44(9): 2849-2857. |
[15] | Jianfeng YANG, Heye XIAO, Liang LI, Junqiang BAI, Weihao DONG. Multi-level module partition method of UAV based on fuzzy clustering and expert scoring mechanism [J]. Systems Engineering and Electronics, 2022, 44(8): 2530-2539. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||