| 1 | 孙彧, 李清伟, 徐志雄,等.基于多智能体深度强化学习的空战博弈对抗策略训练模型[J].指挥信息系统与技术,2021,12(2):16-20. | 
																													
																						|  | SUNY , LIQ W , XUZ X ,et al.Game confrontation strategy training model for air combat based on multi agent deep reinforcement learning[J].Command Information System and Technology,2021,12(2):16-20. | 
																													
																						| 2 | 陈希亮, 曹雷, 沈驰.基于深度逆向强化学习的行动序列规划问题研究[J].国防科技,2019,40(4):55-61. | 
																													
																						|  | CHENX L , CAOL , SHENC .Research on action sequence planning based on deep inverse reinforcement learning[J].National Defense Science & Technology,2019,40(4):55-61. | 
																													
																						| 3 | 曹雷, 孙彧, 陈希亮,等.联合作战任务智能规划关键技术及其应用思考[J].国防科技,2020,41(3):49-56. | 
																													
																						|  | CAOL , SUNY , CHENX L ,et al.Key technology and application of intelligent mission planning in joint operations[J].National Defense Science & Technology,2020,41(3):49-56. | 
																													
																						| 4 | 曹雷, 陈希亮, 汤伟.智能化陆军建设[J].国防科技,2019,40(4):14-19. | 
																													
																						|  | CAOL , CHENX L , TANGW .Intelligent army construction[J].National Defense Science & Technology,2019,40(4):14-19. | 
																													
																						| 5 | 陈希亮, 李清伟, 孙彧.基于博弈对抗的空战智能决策关键技术[J].指挥信息系统与技术,2021,12(2):6. | 
																													
																						|  | CHENX L , LIQ W , SUNY .Key technologies for air combat intelligent decision based on game confrontation[J].Command Information System and Technology,2021,12(2):6. | 
																													
																						| 6 | SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning[C]// Proc. of the 17th International Conference on Autonomous Agents and Multiagent Systems, 2018: 10-15. | 
																													
																						| 7 | RASHID T, SAMVELYAN M, WITT C D, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning[C]//Proc. of the 35th International Confe-rence on Machine Learning, 2018: 4295-4304. | 
																													
																						| 8 | YANG Y, RUI L, LI M, et al. Mean field multi-agent reinforcement learning[C]//Proc. of the 35th International Conference on Machine Learning, 2018: 5571-5580. | 
																													
																						| 9 | FOERSTER J N, CHEN R Y, AL-SHEDIVAT M, et al. Learning with opponent-learning awareness[C]//Proc. of the 17th International Conference on Autonomous Agents and Multi Agent Systems, 2017: 122-130. | 
																													
																						| 10 | PENG P, WEN Y, YANG Y, et al. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games[EB/OL]. [2021-10-10]. https://arxiv.org/pdf/1703.10069.pdf. | 
																													
																						| 11 | HU D P, JIANG X S, WEI X M, et al. State representation learning for minimax deep deterministic policy gradient[C]//Proc. of the 12th International Conference on Knowledge Science, Engineering and Management, 2019: 481-487. | 
																													
																						| 12 | YANG Y D, RUI L, LI M N, et al. Mean field multi-agent reinforcement learning[C]//Proc. of the 35th International Conference on Machine Learning, 2018. | 
																													
																						| 13 | FOERSTER J N, CHEN R Y, AL S M, et al. Learning with opponent-learning awareness[C]//Proc. of the 17th International Conference on Autonomous Agents and Multi Agent Systems, 2017: 122-130. | 
																													
																						| 14 | LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1706.02275. | 
																													
																						| 15 | HERNANDEZ-LEAL P, KAISERS M, BAARSLAG T, et al. A survey of learning in multiagent environments: dealing with non-stationarity[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1707.09183. | 
																													
																						| 16 | WATSONJ .Strategy: an introduction to game theory[M].New York:W. W. Norton & Company,2013. | 
																													
																						| 17 | SUTTONR , BARTOA .Reinforcement learning: an introduction[M].Cambridge:MIT Press,1998. | 
																													
																						| 18 | LECUNY , BENGIOY , HINTONG .Deep learning[J].Nature,2015,521(7553):436-444. | 
																													
																						| 19 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//Proc. of the 4th International Conference on Learning Representations, 2016. | 
																													
																						| 20 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1707.06347. | 
																													
																						| 21 | RASMUSENE .Games and information: an introduction to game theory[J].International Journal of Industrial Organization,1991,9(3):474-476. | 
																													
																						| 22 | LEE K, RENGARAJAN D, KALATHIL D, et al. Learning trembling hand perfect mean field equilibrium for dynamic mean field games[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2006.11683. | 
																													
																						| 23 | SHAPLEY L S. Stochastic games[J]//Proceedings of the National Academy of Sciences, 1953, 39(10): 1095-1100. | 
																													
																						| 24 | BAŞART .Dynamic noncooperative game theory[J].Society for Industrial and Applied Mathematics,1982,19(2):139-152. | 
																													
																						| 25 | 郝峰, 张栋, 唐硕,等.基于改进RRT算法的巡飞弹快速航迹规划方法[J].飞行力学,2019,37(3):58-63. | 
																													
																						|  | HAOF , ZHANGD , TANGS ,et al.A rapid route planning method of loitering munitions based on improved RRT algorithm[J].Flight Mechanics,2019,37(3):58-63. | 
																													
																						| 26 | YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of MAPPO in cooperative, multi-agent games[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2103.01955. | 
																													
																						| 27 | BOOTH J. PPO dash: improving generalization in deep reinforcement learning[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1907.06704. | 
																													
																						| 28 | ENGSTROM L, ILYAS A, SANTURKAR S, et al. Implementation matters in deep policy gradients: a case study on PPO and TRPO[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2005.12729. | 
																													
																						| 29 | PEDREGOSAF , VAROQUAUXG , GRAMFORTA ,et al.Scikit-learn: machine learning in Python[J].The Journal of Machine Learning Research,2011,12(4):2825-2830. | 
																													
																						| 30 | ABADI M, BARHAM P, CHEN J, et al. Tensorflow: a system for large-scale machine learning[C]//Proc. of the 12th Unix Users'Group Symposium on Operating Systems Design and Implementation, 2016: 265-283. |