1 |
ZHOU Z Y, LIU G J, TANG Y. Multi-agent reinforcement learning: methods, applications, visionary prospects, and cha-llenges[EB/OL]. [2023-09-05]. https://doi.org/10.48550/arXiv.2305.10091.
|
2 |
WEN M N , KUBA J , LIN R J , et al. Multi-agent reinforcement learning is a sequence modeling problem[J]. Advances in Neural Information Processing Systems, 2022, 35, 16509- 16521.
|
3 |
VINYALS O , BABUSCHKIN I , CZARNECKI W M , et al. Grandmaster level in StarCraft Ⅱ using multi-agent reinforcement learning[J]. Nature, 2019, 575 (7782): 350- 354.
doi: 10.1038/s41586-019-1724-z
|
4 |
GAO Y M, LIU F Y, WANG L, et al. Towards effective and interpretable human-agent collaboration in MOBA games: a communication perspective[C]//Proc. of the 11th International Conference on Learning Representations, 2023.
|
5 |
张磊, 李姜, 侯进永, 等. 基于改进强化学习的多无人机协同对抗算法研究[J]. 兵器装备工程学报, 2023, 44 (5): 230- 238.
|
|
ZHANG L , LI J , HOU J Y , et al. Research on multi-UAV cooperative confrontation algorithm based on improved reinforcement learning[J]. Journal of Ordnance Equipment Engineering, 2023, 44 (5): 230- 238.
|
6 |
POPE A P , IDE J S , MICOVIC D , et al. Hierarchical reinforcement learning for air combat at DARPA's Alpha dog fight trials[J]. IEEE Trans.on Artificial Intelligence, 2022, 4 (6): 1371- 1385.
|
7 |
ANDRIES S , HERMAN A E , WILLIE B , et al. Scaling multi-agent reinforcement learning to full 11 versus 11 simulated robotic football[J]. Autonomous Agents and Multi-Agent Systems, 2023, 37 (1): 30.
|
8 |
孙辉辉, 胡春鹤, 张军国. 基于主动风险防御机制的多机器人强化学习协同对抗策略[J]. 控制与决策, 2023, 38 (5): 1429- 1450.
|
|
SUN H H , HU C H , ZHANG J G . Cooperative countermeasure strategy based on active risk defense multiagent reinforcement learning[J]. Control and Decision, 2023, 38 (5): 1429- 1450.
|
9 |
ZHANG T. Opponent modelling in multi-agent systems[D]. London: University College London, 2021.
|
10 |
HU H M, SHI D X, YANG H H, et al. Independent multi-agent reinforcement learning using common knowledge[C]//Proc. of the IEEE International Conference on Systems, Man, and Cybernetics, 2022: 2703-2708.
|
11 |
ROSMAN B , HAWASLY M , RAMAMOORTHY S . Bayesian policy reuse[J]. Machine Learning, 2016, 104, 99- 127.
doi: 10.1007/s10994-016-5547-y
|
12 |
何立, 沈亮, 李辉, 等. 强化学习中的策略重用: 研究进展[J]. 系统工程与电子技术, 2022, 44 (3): 884- 899.
|
|
HE L , SHEN L , LI H , et al. Survey on policy reuse in reinforcement learning[J]. Systems Engineering and Electronics, 2022, 44 (3): 884- 899.
|
13 |
HERNANDEZ-LEAL P, TAYLOR M E, ROSMAN B, et al. Identifying and tracking switching, non-stationary opponents: a Bayesian approach[C]//Proc. of the 30th Conference on Artificial Intelligence, 2016.
|
14 |
YANG T P, MENG Z P, HAO J Y, et al. Towards efficient detection and optimal response against sophisticated opponents[C]// Proc. of the 28th International Joint Conference on Artificial Intelligence, 2019: 623-629.
|
15 |
WEERD H D , VERBRUFFE R , VERHEIJ B . How much does it help to know what she knows you know? an agent-based simulation study[J]. Artificial Intelligence, 2013, 199, 67- 92.
|
16 |
HERNANDEZ-LEAL P , KARTAL B , TAYLOR M E . A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents and Multi-Agent Systems, 2019, 33, 750- 797.
|
17 |
ZHENG Y, MENG Z P, HAO J Y, et al. A deep Bayesian policy reuse approach against non-stationary agents[C]//Proc. of the Advances in Neural Information Processing Systems, 2018.
|
18 |
BANK D , KOENIGSTEIN N , GIRYES R . Autoencoders[J]. Machine Learning for Data Science Handbook, 2023,
doi: 10.1007/978-3-031-24628-9_16
|
19 |
ZHAI J H, ZHANG S F, CHEN J F, et al. Autoencoder and its various variants[C]//Proc. of the IEEE International Conference on Systems, Man, and Cybernetics, 2018: 415-419.
|
20 |
LI C J , ZHOU D , GU Q , et al. Learning two-player Markov games: neural function approximation and correlated equilibrium[J]. Advances in Neural Information Processing Systems, 2022, 35, 33262- 33274.
|
21 |
GUO W B, WU X, HUANG S, et al. Adversarial policy learning in two-player competitive games[C]//Proc. of the 38th International Conference on Machine Learning, 2021: 3910-3919.
|
22 |
SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2023-09-05]. https://doi.org/10.48550/arXiv.1707.06347.
|
23 |
VOLODYMYR M, ADRIA P B, MEH D, et al. Asynchronous methods for deep reinforcement learning[C]//Proc. of the 33th International Conference on Machine Learning, 2016.
|
24 |
姜楠, 王健. 信息论与编码理论[M]. 北京: 清华大学出版社, 2010.
|
|
JIANG N , WANG J . The theory of information and coding[M]. Beijing: Tsinghua University Press, 2020.
|
25 |
ZHANG T, YING W G, GONG Z C, et al. A regularized opponent model with maximum entropy objective[C]//Proc. of the 29th International Joint Conference on Artificial Intelligence, 2019.
|
26 |
WIMMER L, SALE Y, HOFMAN P, et al. Quantifying aleatoric and epistemic uncertainty in machine learning: are conditional entropy and mutual information appropriate measures?[C]//Proc. of the 39th Conference on Uncertainty in Artificial Intelligence, 2023: 2282-2292.
|
27 |
MURPHY K P . Probabilistic machine learning: an introduction[M]. Cambridge: Massachusetts Institute of Technology Press, 2022.
|
28 |
CRESCENZO D A , LONGOBARD M . On cumulative entropies[J]. Journal of Statistical Planning and Inference, 2009, 139 (12): 4072- 4087.
|
29 |
PAPOUDAKIS G , CHRISTIANOU F , ALBRECHT S . Agent modelling under partial observability for deep reinforcement learning[J]. Advances in Neural Information Processing Systems, 2021, 34, 19210- 19222.
|
30 |
LOWE R, WU Y I, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proc. of the 31st International Conference on Neural Information Processing Systems, 2017: 6382-6393.
|