基于多智能体强化学习的动态频谱分配方法综述

doi:10.12305/j.issn.1001-506X.2021.11.35

Abstract

Abstract:

Cognitive radio and dynamic spectrum allocation technology are effective means to solve the scarcity of spectrum. With the rapid development of machine learning technology including deep learning and reinforcement learning in recent years, the swarm intelligence technology represented by multi-agent reinforcement learning is continuously making breakthroughs, which is also making distributed and intelligent dynamic spectrum allocation possible. This paper reviews the key research achievements in reinforcement learning and multi-agent reinforcement learning in detail, as well as research in modeling methods and algorithms of dynamic spectrum allocation process based on multi-agent reinforcement learning. The method could boil down to four types: independent Q-learning, cooperating Q-learning, joint Q-learning and multi-agent actor-critic. The advantages and disadvantages of the existing four types of methods are analyzed, and the critical problems and possible solutions of the dynamic spectrum allocation method based on multi-agent reinforcement learning are summarized.

Key words: spectrum management, cognitive radio, dynamic spectrum allocation, machine learning, reinforcement learning, multi-agent reinforcement learning

CLC Number:

TN929.5

Bo SONG, Wei YE, Xianghui MENG. Review of multi-agent reinforcement learning based dynamic spectrum allocation method[J]. Systems Engineering and Electronics, 2021, 43(11): 3338-3351.

Figures/Tables 9

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Table 1

References 89

1	MITOLA J , MAGUIRE G Q . Cognitive radio: making software radios more personal[J]. IEEE Personal Communications, 1999, 6 (4): 13- 18. doi: 10.1109/98.788210
2	HAYKIN S . Cognitive radio: brain-empowered wireless communications[J]. IEEE Journal on Selected Areas in Communications, 2005, 23 (2): 201- 220. doi: 10.1109/JSAC.2004.839380
3	PENG C , ZHENG H , ZHAO B Y . Utilization and fairness in spectrum assignment for opportunistic spectrum access[J]. Mobile Networks and Applications, 2006, 11 (4): 555- 576. doi: 10.1007/s11036-006-7322-y
4	廖楚林, 陈劼, 唐友喜, 等. 认知无线电中的并行频谱分配算法[J]. 电子与信息学报, 2007, 7, 1608- 1611.
	LIAO C L , CHEN J , TANG Y X , et al. Parallel spectrum allocation algorithm in cognitive radio[J]. Journal of Electronics and Information Technology, 2007, 7, 1608- 1611.
5	郝丹丹, 邹仕洪, 程时端. 开放式频谱系统中启发式动态频谱分配算法[J]. 软件学报, 2008, 2008 (3): 479- 491.
	HAO D D , ZOU S H , CHENG S R . Heuristic dynamic spectrum allocation algorithm in open spectrum system[J]. Journal of Software, 2008, 2008 (3): 479- 491.
6	WANG W, LIU X. List-coloring based channel allocation for open-spectrum wireless networks[C]//Proc. of the IEEE 62nd Vehicular Technology Conference, 2005: 690-694.
7	刘鹏, 张国翊, 舒放, 等. 基于图论的认知无线网络频谱动态分配[J]. 电讯技术, 2020, 60 (6): 625- 631. doi: 10.3969/j.issn.1001-893x.2020.06.002
	LIU P , ZHANG G Y , SHU F , et al. Dynamic spectrum allocation of cognitive wireless network based on graph theory[J]. Telecommunication Engineering, 2020, 60 (6): 625- 631. doi: 10.3969/j.issn.1001-893x.2020.06.002
8	何建强, 滕志军, 刘皎. 一种基于改进颜色敏感图论着色的频谱分配算法[J]. 计算机与数字工程, 2019, 47 (8): 1866- 1868, 1889. doi: 10.3969/j.issn.1672-9722.2019.08.004
	HE J Q , TENG Z J , LIU J . A spectrum allocation algorithm based on improved color sensitive graph theory coloring[J]. Computer & Digital Engineering, 2019, 47 (8): 1866- 1868, 1889. doi: 10.3969/j.issn.1672-9722.2019.08.004
9	NEEL J, REED J H, GILLES R P. The role of game theory in the analysis of software radio networks[C]//Proc. of the SDR Forum Technology Conference, 2002.
10	NEEL J O, REED J H, GILLES R P. Convergence of cognitive radio networks[C]//Proc. of the IEEE Wireless Communications and Networking Conference, 2004: 2250-2255.
11	滕志军, 韩雪, 杨旭. 认知无线电中基于博弈论的频谱分配算法[J]. 计算机应用研究, 2011, 28 (7): 2661- 2663. doi: 10.3969/j.issn.1001-3695.2011.07.073
	TENG Z J , HAN X , YANG X . Spectrum allocation algorithm based on game theory in cognitive radio[J]. Application Research of Computers, 2011, 28 (7): 2661- 2663. doi: 10.3969/j.issn.1001-3695.2011.07.073
12	CAO L, ZHENG H. Distributed spectrum allocation via local bargaining[C]//Proc. of the IEEE International Conference on Sensing, Communication and Networking, 2005: 475-486.
13	ETKIN R , PAREKH A , TSE D . Spectrum sharing for unlicensed bands[J]. IEEE Journal on Selected Areas in Communications, 2007, 25 (3): 517- 528. doi: 10.1109/JSAC.2007.070402
14	徐昌彪, 刘雪亮, 鲜永菊. 基于博弈论的动态频谱分配技术研究[J]. 电子技术应用, 2012, 38 (4): 102- 105, 109.
	XU C B , LIU X L , XIAN Y J . Research on dynamic spectrum allocation technology based on game theory[J]. Application of Electronic Technique, 2012, 38 (4): 102- 105, 109.
15	刘鑫一, 姜建. 基于拍卖理论的认知物联网频谱分配策略[J]. 中国科技论文, 2016, 11 (19): 2187- 2192. doi: 10.3969/j.issn.2095-2783.2016.19.008
	LIU X Y , JIANG J . Cognitive IoT spectrum allocation strategy based on auction theory[J]. China Science Paper, 2016, 11 (19): 2187- 2192. doi: 10.3969/j.issn.2095-2783.2016.19.008
16	CHEN J , GAO Y M , KUO Y H . A parallel repeated auction for spectrum allocation in distributed cognitive radio networks[J]. Wireless personal communications, 2014, 77 (4): 2839- 2855. doi: 10.1007/s11277-014-1671-9
17	ZHOU X, ZHENG H. TRUST: a general framework for truthful double spectrum auctions[C]//Proc. of the IEEE International Conference on Computer Communications, 2009: 999-1007.
18	WANG Q, YE B, XU T, et al. An approximate truthfulness motivated spectrum auction for dynamic spectrum access[C]//Proc. of the IEEE Wireless Communications and Networking Conference, 2011: 257-262.
19	MNIH V , KAVUKCUOGLU K , SILV-ER D , et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533. doi: 10.1038/nature14236
20	HASSELT H . Double Q-learning[J]. Advances in neural information processing systems (NeurIPS), 2010, 23, 2613- 2621.
21	VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[C]//Proc. of the AAAI Conference on Artificial Intelligence, 2016.
22	WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proc. of Machine Learning Research, 2016: 1995-2003.
23	FORTUNATO M, AZAR M G, PIOT B, et al. Noisy networks for exploration[EB/OL]. [2021-05-08]. https://arxiv.org/abs/1706.10295.
24	HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[C]//Proc. of the AAAI 32nd Conference on Artificial Intelligence, 2018: 3215-3223.
25	GU S, LILLICRAP T, SUTSKEVER I, et al. Continuous deep Q-learning with model-based acceleration[C]//Proc. of the International Conference on Machine Learning, 2016: 2829-2838.
26	KONDA V R, TSITSIKLIS J N. Act-or-Critic algorithms[C]//Proc. of the Advances in Neural Information Processing Systems, 2000: 1008-1014.
27	MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proc. of the International conference on Machine Learning, 2016: 1928-1937.
28	SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proc. of the International Conference on Machine Learning, 2015: 1889-1897.
29	WU Y, MANSIMOV E, GROSSE R B, et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation[C]//Proc. of the Advances in Neural Information Processing Systems, 2017: 5279-5288.
30	SCHULMAN J, WOLSKI F, DHARI-WAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2021-05-08]. https://arxiv.org/abs/1707.06347.
31	WANG Z, BAPST V, HEESS N, et al. Sample efficient actor-critic with experience replay[EB/OL]. [2021-05-08] https://arxiv.org/abs/1611.01224.
32	MUNOS R, STEPLETON T, HARUT-YUNYAN A, et al. Safe and efficient off-policy reinforcement learning[EB/OL]. [2021-05-08]. https://arxiv.org/abs/1606.02647.
33	SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//Proc. of the 31st International Conference on Machine Learning Volume, 2014: I-387-I-395.
34	FUJIMOTO S, VAN HOOF H, MEG-ER D. Addressing function approximation error in actor-critic methods[C]//Proc. of the Machine Learning Research, 2018.
35	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proc. of the International Confe-rence on Machine Learning, 2018: 1861-1870.
36	SUTTON R S , BARTO A G . Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 2018.
37	SILVER D, SUTTON R S, MVLLERM. Sample-based learning and search with permanent and transient memories[C]//Proc. of the International Conference on Machine Learning, 2008: 968-975.
38	COULOM R. Efficient selectivity and backup operators in Monte-Carlo tree search[C]//Proc. of the International Conference on Computers and Games, 2006: 72-83.
39	SILVER D , HUANG A , MADDISON C J , et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529 (7587): 484- 489. doi: 10.1038/nature16961
40	SILVER D , SCHRITTWIESER J , SI-MONYAN K , et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550 (7676): 354- 359. doi: 10.1038/nature24270
41	SCHRITTWIESER J , ANTONOGLOU I , HUBERT T , et al. Mastering Atari, go, chess and shogi by planning with a learned model[J]. Nature, 2020, 588 (7839): 604- 609. doi: 10.1038/s41586-020-03051-4
42	GOODFELLOW I , POUGET-ABADIE J , MIRZA M , et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63 (11): 139- 144. doi: 10.1145/3422622
43	HO J, ERMON S. Generative adversarial imitation learning[C]//Proc. of the Advances in Neural Information Processing Systems, 2016, 29: 4565-4573.
44	LEVINE S, KUMAR A, TUCKER G, et al. Offline reinforcement learning: tutorial, review, and perspectives on open problems[EB/OL]. [2021-05-08]. https://arxiv.org/abs/2005.01643.
45	殷昌盛, 杨若鹏, 朱巍, 等. 多智能体分层强化学习综述[J]. 智能系统学报, 2020, 15 (4): 646- 655.
	YIN C S , YANG R P , ZHU W , et al. A survey on multi-agent hierarchical reinforcement learning[J]. CAAI Transactions on Intelligent Systems, 2020, 15 (4): 646- 655.
46	BUȘONIU L , BABUŠKA R , SCHUTTER B D . Innovations in multi-agent systems and applications[M]. Heidelberg: Springer, 2010: 183- 221.
47	LITTMAN M L . Value-function reinforcement learning in Markov games[J]. Cognitive Systems Research, 2001, 2 (1): 55- 66. doi: 10.1016/S1389-0417(01)00015-8
48	LAUER M, RIEDMILLER M. An algorithm for distributed reinforcement learning in cooperative multi-agent systems[C]//Proc. of the 17th International Conference on Machine Learning, 2000: 535-542.
49	LITTMAN M L . Machine learning proceedings[M]. Amsterdam: Elsevier Inc., 1994: 157- 163.
50	HU J, WELLMAN M P. Multiagent reinforcement learning: theoretical framework and an algorithm[C]//Proc. of the International Conference on Machine Learning, 1998: 242-250.
51	GREENWALD A, HALL K, SERRA-NO R. Correlated Q-learning[C]//Proc. of the International Conference on Machine Learning, 2003: 242.
52	梁星星, 冯旸赫, 马扬, 等. 多Agent深度强化学习综述[EB/OL]. [2020-11-30]. https://doi.org/10.16383/j.aas.c180372.
	LIANG X X, FENG Y H, MA Y, et al. Re-view of multi-Agent deep reinforcement learning[EB/OL]. [2020-11-30]. https://doi.org/10.16383/j.aas.c180372.
53	SUKHBAATAR S, FERGUS R. Learning multiagent communication with backpropagation[C]//Proc. of the Advances in Neural Information Processing Systems, 2016: 2244-2252.
54	PENG P, YUAN Q, WEN Y, et al. Multi-agent bidirectionally-coordinated nets for learning to play starcraft combat games[EB/OL]. [2021-05-08]. https://arxiv.org/abs/1703.10069.
55	TAN M. Multi-agent reinforcement learning: independent vs. cooperative agents[C]//Proc. of the 10th International Conference on Machine Learning, 1993: 330-337.
56	MATIGNON L, LAURENT G J, LE FORT-PIAT N. Hysteretic Q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams[C]//Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007: 64-69.
57	OMIDSHAFIEI S, PAZIS J, AMATO C, et al. Deep decentralized multi-task multi-agent reinforcement learning under partial observability[C]//Proc. of the International Conference on Machine Learning, 2017: 2681-2690.
58	PEROLAT J, PIOT B, PIETQUIN O. Actor-critic fictitious play in simultaneous move multistage games[C]//Proc. of the International Conference on Artificial Intelligence and Statistics, 2018: 919-928.
59	FOERSTER J, ASSAEL Y, DEFREI-TAS N, et al. Learning to communicate with deep multi-agent reinforcement learning[C]//Proc. of the Advances in Neural Information Processing Systems, 2016: 2137-2145.
60	MAO H, GONG Z, NI Y, et al. ACCNet: actor-coordinator-critic net for "learning-to-communicate" with deep multi-agent reinforcement learning[EB/OL]. [2021-05-08]. https://arxiv.org/abs/1706.03235.
61	LOWE R, WU Y I, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proc. of the Advances in Neural Information Processing Systems, 2017: 6379-6390.
62	FOERSTER J, FARQUHAR G, AFO-URAS T, et al. Counterfactual multi-agent policy gradients[J]. arXiv preprint arXiv: 1705.08926, 2017.
63	SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward[C]//Proc. of the 17th International Conference on Autonomous Agents and Multi-Agent Systems, 2018: 2085-2087.
64	RASHID T, SAMVELYAN M, SCHROEDER C, et al. Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning[C]//Proc. of the International Conference on Machine Learning, 2018: 4295-4304.
65	VASWANI A, SHAZEER N, PARMA-R N, et al. Attention is all you need[C]//Proc. of the Advances in Neural Information Processing Systems, 2017: 5998-6008.
66	YANG Y, HAO J, LIAO B, et al. Q-atten: a general framework for cooperative multiagent reinforcement learning[EB/OL]. [2021-05-08]. https://arxiv.org/abs/2002.03939.
67	SHALEV-SHWARTZ S, SHAMMAH S, SHASHUA A. Safe, multi-agent, reinforcement learning for autonomous driving[J]. arXiv preprint arXiv: 1610.03295, 2016.
68	AYDIN M E, FELLOWS R. A reinforcement learning algorithm for building collaboration in multi-agent systems[EB/OL]. [2021-05-08]. https://arxiv.org/abs/1711.10574.
69	LUO Y, SHI Z P, ZHOU X, et al. Dynamic resource allocations based on Q-learning for D2D communication in cellular networks[C]//Proc. of the IEEE 11th International Computer Conference on Wavelet Active Media Technology and Information Processing, 2014: 385-388.
70	ZHAO Q , SADLER B M . A survey of dynamic spectrum access[J]. IEEE Signal Processing Magazine, 2007, 24 (3): 79- 89. doi: 10.1109/MSP.2007.361604
71	KOLODZY P. Spectrum policy task force: findings and recommendations[C]//Proc. of the International Symposium on Advanced Radio Technologies, 2003: 459-465.
72	CHEN Y X , ZHAO Q , SWAMI A . Joint design and separation principle for opportunistic spectrum access in the presence of sensing errors[J]. IEEE Trans. on Information Theory, 2008, 54 (5): 2053- 2071. doi: 10.1109/TIT.2008.920248
73	夏婷婷. 基于部分可观测马尔可夫决策过程的机会频谱接入方案设计[D]. 南京: 南京邮电大学, 2014.
	XIA T T. Project design of the opportunistic spectrum access based on POMDP[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2014.
74	郭冰洁. 认知无线电系统中多信道动态频谱接入算法研究[D]. 成都: 电子科技大学, 2013.
	GUO B J. Research on multi-channel dynamic spectrum access algorithms in cognitive radio systems[D]. Chengdu: University of Electronic Science and Technology of China, 2013.
75	何浩. 认知无线电动态频谱接入关键技术研究[D]. 成都: 电子科技大学, 2014.
	HE H. Research on key technologies of dynamic spectrum access in cognitive radio[D]. Chengdu: University of Electronic Science and Technology of China, 2014.
76	叶梓峰. 基于深度强化学习的动态频谱分配方法研究[D]. 广州: 广东工业大学, 2019.
	YE Z F. Research of dynamic spectrum allocation methods based on deep reinforcement learning[D]. Guangzhou: Guangdong University of Technology, 2019.
77	LI H. Multi-agent Q-learning of channel selection in multi-user cognitive radio systems: a two by two case[C]//Proc. of the IEEE International Conference on Systems, Man and Cybernetics, 2009: 1893-1898.
78	TENG Y L, ZHANG Y, NIU F, et al. Reinforcement learning based auction algorithm for dynamic spectrum aces-s in cognitive radio networks[C]//Proc. of the IEEE 72nd Vehicular Technology Conference-Fall, 2010.
79	WU C, CHOWDHURY K, FELICE M D, et al. Spectrum management of cognitive radio using multi-agent reinforcement learning[C]//Proc. of the 9th International Conference on Autonomous Agents and Multi-Agent Systems: Industry Track, 2010: 1705-1712.
80	伍春, 江虹, 易克初. 聚类多Agent强化学习认知无线电资源分配[J]. 北京邮电大学学报, 2014, 37 (1): 80- 84.
	WU C , JIANG H , YI K C . Clustering multi-Agent reinforcement learning cognitive radio resource allocation[J]. Journal of Beijing University of Posts and Telecommunications, 2014, 37 (1): 80- 84.
81	ZIA K , JAVED N , SIAL M N , et al. A distributed multi-agent RL-based autonomous spectrum allocation scheme in D2D enabled multi-tier HetNets[J]. IEEE Access, 2019, 7, 6733- 6745. doi: 10.1109/ACCESS.2018.2890210
82	ASHERALIEVA A, MIYANAGA Y. Multi-agent Q-learning for autonomous D2D communication[C]//Proc. of the IEEE International Symposium on Intelligent Signal Processing and Communication Systems, 2016.
83	NAPARSTEK O , COHEN K . Deep multi-user reinforcement learning for distributed dynamic spectrum access[J]. IEEE Trans. on Wireless Communications, 2018, 18 (1): 310- 323.
84	ZHAO J H , ZHANG Y , NIE Y W , et al. Intelligent resource allocation for train-to-train communication: a multi-agent deep reinforcement learning approach[J]. IEEE Access, 2020, 8, 8032- 8040. doi: 10.1109/ACCESS.2019.2963751
85	NASIR Y S , GUO D . Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks[J]. IEEE Journal on Selected Areas in Communications, 2019, 37 (10): 2239- 2250. doi: 10.1109/JSAC.2019.2933973
86	YAU K L A, KOMISARCZUK P, PAUL D T. Enhancing network performance in distributed cognitive radio networks using single-agent and multi-agent reinforcement learning[C]//Proc. of the IEEE Local Computer Network Conference, 2010: 152-159.
87	LALL S, SADHU A K, KONAR A, et al. Multi-agent reinforcement learning for stochastic power management in cognitive radio network[C]//Proc. of the IEEE International Conference on Microelectronics, Computing and Communications, 2016.
88	WANG S , LIU H , GOMES P H , et al. Deep reinforcement learning for dynamic multichannel access in wireless networks[J]. IEEE Trans. on Cognitive Communications and Networking, 2018, 4 (2): 257- 265.
89	LI Z , GUO C L . Multi-agent deep reinforcement learning based spectrum allocation for D2D underlay communications[J]. IEEE Trans. on Vehicular Technology, 2019, 69 (2): 1828- 1840.

区别	基于IQL的方法	基于CQL的方法	基于JQL的方法	基于MAAC的方法
是否需要控制中心	否	否	是	否
是否需要用户协作	否	是	是	否
是否需要集中训练	否	否	是	是
适用的用户数量	少	少	较多	多
训练稳定性	低	中	中	高
对用户间信息交互的需求	无	大	大	无

[1]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Research on decision-making modeling of cognitive jamming for multi-functional radar based on Markov [J]. Systems Engineering and Electronics, 2022, 44(8): 2488-2497.
[2]	Guan WANG, Haizhong RU, Dali ZHANG, Guangcheng MA, Hongwei XIA. Design of intelligent control system for flexible hypersonic vehicle [J]. Systems Engineering and Electronics, 2022, 44(7): 2276-2285.
[3]	Lingyu MENG, Bingli GUO, Wen YANG, Xinwei ZHANG, Zuoqing ZHAO, Shanguo HUANG. Network routing optimization approach based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(7): 2311-2318.
[4]	Dongzi GUO, Rong HUANG, Hechuan XU, Liwei SUN, Naigang CUI. Research on deep deterministic policy gradient guidance method for reentry vehicle [J]. Systems Engineering and Electronics, 2022, 44(6): 1942-1949.
[5]	Mingren HAN, Yufeng WANG. Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(5): 1652-1661.
[6]	Chunzheng WANG, Minghua HU, Lei YANG, Zheng ZHAO. Review on air traffic delay prediction [J]. Systems Engineering and Electronics, 2022, 44(3): 863-874.
[7]	Li HE, Liang SHEN, Hui LI, Zhuang WANG, Wenquan TANG. Survey on policy reuse in reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(3): 884-899.
[8]	Xinyu ZHANG, Yuan LIU, Jianing SONG. Short-term orbit prediction based on LSTM neural network [J]. Systems Engineering and Electronics, 2022, 44(3): 939-947.
[9]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Multi-function radar intelligent jamming decision method based on prior knowledge [J]. Systems Engineering and Electronics, 2022, 44(12): 3685-3695.
[10]	Qingqing YANG, Yingying GAO, Yu GUO, Boyuan XIA, Kewei YANG. Target search path planning for naval battle field based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(11): 3486-3495.
[11]	Bin ZENG, Hongqiang ZHANG, Houpu LI. Research on anti-submarine strategy for unmanned undersea vehicles [J]. Systems Engineering and Electronics, 2022, 44(10): 3174-3181.
[12]	Qitian WAN, Baogang LU, Yaxin ZHAO, Qiuqiu WEN. Autopilot parameter rapid tuning method based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(10): 3190-3199.
[13]	He LI, wenjing ZHAO, Minglu JIN. Improved spectrum sensing algorithms based on high order moments of eigenvalues [J]. Systems Engineering and Electronics, 2022, 44(10): 3243-3248.
[14]	Bin ZENG, Rui WANG, Houpu LI, Xu FAN. Scheduling strategies research based on reinforcement learning for wartime support force [J]. Systems Engineering and Electronics, 2022, 44(1): 199-208.
[15]	Zhiwei JIANG, Yang HUANG, Qihui WU. Anti-interference frequency allocation based on kernel reinforcement learning [J]. Systems Engineering and Electronics, 2021, 43(6): 1547-1556.

Review of multi-agent reinforcement learning based dynamic spectrum allocation method

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 89

Related Articles 15

Recommended Articles

Metrics

Comments