基于深度强化学习的应急通信网络规划方法

doi:10.3969/j.issn.1001-506X.2020.09.27

Abstract

Abstract:

Focus on the problem of high demand on prior knowledge and weak timeliness of traditional algorithm for emergency communication network planning, a toplogy planning method for emergency communication network based on deep reinforcement learning is proposed. Developing a method of sample data generation using Monte Carlo tree search and self-game, the policy network and value network based on residual network is designed. On this basis, Tensorflow is used to build and train the model. Simulation results show that the proposed planning method can effctively realize the intelligent planning of network topology, and has high timeliness and feasibility.

Key words: emergency communication, network planning, reinforcement learning, intelligence

CLC Number:

TP302

Changsheng YIN, Ruopeng YANG, Wei ZHU, Xiaofei ZOU. Emergency communication network planning method based on deep reinforcement learning[J]. Systems Engineering and Electronics, 2020, 42(9): 2091-2097.

Figures/Tables 7

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Table 1

References 31

1	CHITI F , FANTACCI R . A broadband wireless communication system for emergency management[J]. IEEE Wireless Communications, 2008, 15 (3): 8- 14. doi: 10.1109/MWC.2008.4547517
2	张赛男, 刘东亮. 基于CW节约算法和遗传算法的网络优化[J]. 吉林大学学报, 2018, 56 (5): 1219- 1223.
	ZHANG S N , LIU D L . Network optimization based on CW saving algorithm and genetic algorithm[J]. Journal of Jilin University, 2018, 56 (5): 1219- 1223.
3	张捷, 杨希龙. 基于模拟退火算法的移动通信网络自规划[J]. 计算机工程, 2017, 43 (5): 83- 87. doi: 10.3969/j.issn.1000-3428.2017.05.013
	ZHANG J , YANG X L . Mobile communication network self-planning based on simulated annealing algorithm[J]. Computer Engineering, 2017, 43 (5): 83- 87. doi: 10.3969/j.issn.1000-3428.2017.05.013
4	周宇航.基于FSO的5G回传网络节点定位及拓扑优化策略研究[D].北京:北京邮电大学, 2019.
	ZHOU Y H. Research on node deployment and topology optimization strategy in FSO-based 5G backhaul networks[D]. Beijing: Beijing University of Posts and Telecommunications, 2019.
5	吴文甲.多射频多信道无线Mesh网络的拓扑规划研究[D].南京:东南大学, 2013.
	WU W J. Research on topology planning for multi-interface multi-channel wireless Mesh networks[D]. Nanjing: Southeast University, 2013.
6	LE D N , NGUYEN N G , DINH N H , et al. Optimizing gateway placement in wireless mesh networks based on ACO algorithm[J]. International Journal of Computer & Communication Engineering, 2013, 2 (2): 45- 53.
7	KAMAR A, NAWAZ S J, PATWARY M M, et al. Optimized algorithm for cellular network planning based on terrain and demand analysis[C]//Proc.of the International Conference on Computer Technologies and Development, 2010: 359-364.
8	ZHOU Z H . Machine learning[M]. Beijing: Tsinghua University Press, 2016.
9	LECUN Y , BENGIO Y , HINTON G . Deep learning[J]. Nature, 2015, 521 (7553): 436- 444. doi: 10.1038/nature14539
10	刘全, 翟建伟, 章宗长, 等. 深度强化学习综述[J]. 计算机学报, 2018, 41 (1): 1- 27.
	LIU Q , ZHAI J W , ZHANG Z Z , et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41 (1): 1- 27.
11	FERRET J, MARINIER R, GEIST M, et al. Credit assignment as a proxy for transfer in reinforcement learning[EB/OL]. [2019-7-18]. https://arxiv.org/abs/1907.08027v1.
12	JADERBERG M , MOJCIECH M C . Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364 (6443): 859- 865. doi: 10.1126/science.aau6249
13	SUTTON R S , BARTO A G . Reinforcement learning: an introduction[M]. Cambridge, USA: Massachusetts Institute of Technology Press, 1998.
14	HAUSKNECHT M, STONE P. Deep recurrent Q-learning for partially observable MDPs[EB/OL].[2017-11-16]. https://arxiv.org/abs/1507.06527.
15	VAN H H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-1earning[C]//Proc.of the AAAI Conference on Artificial Intelligence, 2016: 2094-2100.
16	SILVER D , SCHRITTWIESER J , SIMONYAN K , et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550 (7676): 354- 359. doi: 10.1038/nature24270
17	SHAN K , ZHU Y H , ZHAO D B . StarCraft micromanagement with reinforcement learning and curriculum transfer learn-ing[J]. IEEE Trans.on Emerging Topics in Computational Intelligence, 2019, 3 (1): 73- 84. doi: 10.1109/TETCI.2018.2823329
18	CLARK C, STORKEY A J. Training deep convolutional neural networks to play Go[C]//Proc.of the 32nd International Conference on Machine Learning, 2015, 37: 1766-1774.
19	LIU S Q, LEVER G, MEREL J, et al. Emergent coordination through competition[EB/OL].[2019-2-21]. https://arxiv.org/abs/1902.07151.
20	FORTUNATO M, TAN M, FAULKNER R, et al. Generalization of reinforcement learners with working and episodic memory[C]//Proc.of the 33rd Conference on Neural Information Processing Systems, 2019: 12448-12457.
21	SILVER D , HUANG A , MADDISON C J , et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529 (7587): 484- 489. doi: 10.1038/nature16961
22	XUE B , GLEN B . DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning[J]. ACM Transactions on Graphics, 2017, 36 (4): 1- 16.
23	SCHERRER B , GHAVAMZADEH M , GABILLON V , et al. Approximate muddied policy iteration and its application to the game of tetris[J]. The Journal of Machine Learning Research, 2015, 16 (1): 1629- 1676.
24	MNIH V , KAVUKCUOGLU K , SILVER D , et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533. doi: 10.1038/nature14236
25	MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proc.of the 33rd International Conference on Machine Learning, 2016, 48: 1928-1937.
26	KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[C]//Proc.of the 25th International Conference on Neural Information Processing Systems, 2012: 1097-1105.
27	SALAKHUTDINOV R, MNIH A, HINTON G. Restricted boltzmann machine for collaborative filtering[C]//Proc.of the ACM International Conference Machine Learning, 2007: 791-798.
28	SILVER D , HUBERT T . A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Scie-nce, 2018, 362 (6419): 1140- 1144.
29	JADERBERG M , Czarnecki W M , DUNNING I , et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364 (6443): 859- 865. doi: 10.1126/science.aau6249
30	WU B, FU Q, LIANG J, et al. Hierarchical macro strategy model for MOBA game AI[EB/OL].[2018-12-19]. https://arxiv.org/abs/1812.07887v1.
31	SCHMIDHUBER J . Deep learning in neural networks: an over-view[J]. Neural Networks, 2015, 61, 85- 117. doi: 10.1016/j.neunet.2014.09.003

网络规模	规划时间/s
网络规模	1套方案	2套方案	3套方案
6×6	12.4	16.9	21.3
8×8	19	24.1	29.0
10×10	22.6	28.9	35.2

[1]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Research on decision-making modeling of cognitive jamming for multi-functional radar based on Markov [J]. Systems Engineering and Electronics, 2022, 44(8): 2488-2497.
[2]	Guan WANG, Haizhong RU, Dali ZHANG, Guangcheng MA, Hongwei XIA. Design of intelligent control system for flexible hypersonic vehicle [J]. Systems Engineering and Electronics, 2022, 44(7): 2276-2285.
[3]	Lingyu MENG, Bingli GUO, Wen YANG, Xinwei ZHANG, Zuoqing ZHAO, Shanguo HUANG. Network routing optimization approach based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(7): 2311-2318.
[4]	Dongzi GUO, Rong HUANG, Hechuan XU, Liwei SUN, Naigang CUI. Research on deep deterministic policy gradient guidance method for reentry vehicle [J]. Systems Engineering and Electronics, 2022, 44(6): 1942-1949.
[5]	Mingren HAN, Yufeng WANG. Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(5): 1652-1661.
[6]	Li HE, Liang SHEN, Hui LI, Zhuang WANG, Wenquan TANG. Survey on policy reuse in reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(3): 884-899.
[7]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Multi-function radar intelligent jamming decision method based on prior knowledge [J]. Systems Engineering and Electronics, 2022, 44(12): 3685-3695.
[8]	Chenrui SHI, Lu TIAN, Zhan XU, Ruxin ZHI, Jinhui CHEN. Effectiveness evaluation method of emergency communication and sensing equipment based on PSO-BP [J]. Systems Engineering and Electronics, 2022, 44(11): 3455-3462.
[9]	Qingqing YANG, Yingying GAO, Yu GUO, Boyuan XIA, Kewei YANG. Target search path planning for naval battle field based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(11): 3486-3495.
[10]	Bin ZENG, Hongqiang ZHANG, Houpu LI. Research on anti-submarine strategy for unmanned undersea vehicles [J]. Systems Engineering and Electronics, 2022, 44(10): 3174-3181.
[11]	Qitian WAN, Baogang LU, Yaxin ZHAO, Qiuqiu WEN. Autopilot parameter rapid tuning method based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(10): 3190-3199.
[12]	Bin ZENG, Rui WANG, Houpu LI, Xu FAN. Scheduling strategies research based on reinforcement learning for wartime support force [J]. Systems Engineering and Electronics, 2022, 44(1): 199-208.
[13]	Zhiwei JIANG, Yang HUANG, Qihui WU. Anti-interference frequency allocation based on kernel reinforcement learning [J]. Systems Engineering and Electronics, 2021, 43(6): 1547-1556.
[14]	Ang GAO, Zhiming DONG, Liang LI, Li DUAN, Qisheng GUO. Decision modeling of close-range air combat for LVC training in blue-side virtual entity [J]. Systems Engineering and Electronics, 2021, 43(6): 1606-1617.
[15]	Jiayi LIU, Shaohua YUE, Gang WANG, Xiaoqiang YAO, Jie ZHANG. Cooperative evolution algorithm of multi-agent system under complex tasks [J]. Systems Engineering and Electronics, 2021, 43(4): 991-1002.

Emergency communication network planning method based on deep reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 31

Related Articles 15

Recommended Articles

Metrics

Comments