1 |
孙彧, 李清伟, 徐志雄,等.基于多智能体深度强化学习的空战博弈对抗策略训练模型[J].指挥信息系统与技术,2021,12(2):16-20.
|
|
SUNY , LIQ W , XUZ X ,et al.Game confrontation strategy training model for air combat based on multi agent deep reinforcement learning[J].Command Information System and Technology,2021,12(2):16-20.
|
2 |
陈希亮, 曹雷, 沈驰.基于深度逆向强化学习的行动序列规划问题研究[J].国防科技,2019,40(4):55-61.
|
|
CHENX L , CAOL , SHENC .Research on action sequence planning based on deep inverse reinforcement learning[J].National Defense Science & Technology,2019,40(4):55-61.
|
3 |
曹雷, 孙彧, 陈希亮,等.联合作战任务智能规划关键技术及其应用思考[J].国防科技,2020,41(3):49-56.
|
|
CAOL , SUNY , CHENX L ,et al.Key technology and application of intelligent mission planning in joint operations[J].National Defense Science & Technology,2020,41(3):49-56.
|
4 |
曹雷, 陈希亮, 汤伟.智能化陆军建设[J].国防科技,2019,40(4):14-19.
|
|
CAOL , CHENX L , TANGW .Intelligent army construction[J].National Defense Science & Technology,2019,40(4):14-19.
|
5 |
陈希亮, 李清伟, 孙彧.基于博弈对抗的空战智能决策关键技术[J].指挥信息系统与技术,2021,12(2):6.
|
|
CHENX L , LIQ W , SUNY .Key technologies for air combat intelligent decision based on game confrontation[J].Command Information System and Technology,2021,12(2):6.
|
6 |
SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning[C]// Proc. of the 17th International Conference on Autonomous Agents and Multiagent Systems, 2018: 10-15.
|
7 |
RASHID T, SAMVELYAN M, WITT C D, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning[C]//Proc. of the 35th International Confe-rence on Machine Learning, 2018: 4295-4304.
|
8 |
YANG Y, RUI L, LI M, et al. Mean field multi-agent reinforcement learning[C]//Proc. of the 35th International Conference on Machine Learning, 2018: 5571-5580.
|
9 |
FOERSTER J N, CHEN R Y, AL-SHEDIVAT M, et al. Learning with opponent-learning awareness[C]//Proc. of the 17th International Conference on Autonomous Agents and Multi Agent Systems, 2017: 122-130.
|
10 |
PENG P, WEN Y, YANG Y, et al. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games[EB/OL]. [2021-10-10]. https://arxiv.org/pdf/1703.10069.pdf.
|
11 |
HU D P, JIANG X S, WEI X M, et al. State representation learning for minimax deep deterministic policy gradient[C]//Proc. of the 12th International Conference on Knowledge Science, Engineering and Management, 2019: 481-487.
|
12 |
YANG Y D, RUI L, LI M N, et al. Mean field multi-agent reinforcement learning[C]//Proc. of the 35th International Conference on Machine Learning, 2018.
|
13 |
FOERSTER J N, CHEN R Y, AL S M, et al. Learning with opponent-learning awareness[C]//Proc. of the 17th International Conference on Autonomous Agents and Multi Agent Systems, 2017: 122-130.
|
14 |
LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1706.02275.
|
15 |
HERNANDEZ-LEAL P, KAISERS M, BAARSLAG T, et al. A survey of learning in multiagent environments: dealing with non-stationarity[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1707.09183.
|
16 |
WATSONJ .Strategy: an introduction to game theory[M].New York:W. W. Norton & Company,2013.
|
17 |
SUTTONR , BARTOA .Reinforcement learning: an introduction[M].Cambridge:MIT Press,1998.
|
18 |
LECUNY , BENGIOY , HINTONG .Deep learning[J].Nature,2015,521(7553):436-444.
|
19 |
LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//Proc. of the 4th International Conference on Learning Representations, 2016.
|
20 |
SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1707.06347.
|
21 |
RASMUSENE .Games and information: an introduction to game theory[J].International Journal of Industrial Organization,1991,9(3):474-476.
|
22 |
LEE K, RENGARAJAN D, KALATHIL D, et al. Learning trembling hand perfect mean field equilibrium for dynamic mean field games[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2006.11683.
|
23 |
SHAPLEY L S. Stochastic games[J]//Proceedings of the National Academy of Sciences, 1953, 39(10): 1095-1100.
|
24 |
BAŞART .Dynamic noncooperative game theory[J].Society for Industrial and Applied Mathematics,1982,19(2):139-152.
|
25 |
郝峰, 张栋, 唐硕,等.基于改进RRT算法的巡飞弹快速航迹规划方法[J].飞行力学,2019,37(3):58-63.
|
|
HAOF , ZHANGD , TANGS ,et al.A rapid route planning method of loitering munitions based on improved RRT algorithm[J].Flight Mechanics,2019,37(3):58-63.
|
26 |
YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of MAPPO in cooperative, multi-agent games[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2103.01955.
|
27 |
BOOTH J. PPO dash: improving generalization in deep reinforcement learning[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1907.06704.
|
28 |
ENGSTROM L, ILYAS A, SANTURKAR S, et al. Implementation matters in deep policy gradients: a case study on PPO and TRPO[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2005.12729.
|
29 |
PEDREGOSAF , VAROQUAUXG , GRAMFORTA ,et al.Scikit-learn: machine learning in Python[J].The Journal of Machine Learning Research,2011,12(4):2825-2830.
|
30 |
ABADI M, BARHAM P, CHEN J, et al. Tensorflow: a system for large-scale machine learning[C]//Proc. of the 12th Unix Users'Group Symposium on Operating Systems Design and Implementation, 2016: 265-283.
|