基于强化学习的改进三维A*算法在线航迹规划

doi:10.12305/j.issn.1001-506X.2023.01.23

摘要/Abstract

摘要：

针对飞行器在线航迹规划对算法实时性与结果最优性要求高的问题，基于强化学习方法改进三维A^*算法。首先，引入收缩因子改进代价函数的启发信息加权方法提升算法时间性能；其次，建立算法实时性与结果最优性的性能变化度量模型，结合深度确定性策略梯度方法设计动作-状态与奖励函数，对收缩因子进行优化训练；最后，在多场景下对改进后的三维A^*算法进行仿真验证。仿真结果表明，改进算法能够在保证航迹结果最优性的同时有效提升算法时间性能。

关键词: 改进A^*算法, 收缩因子, 强化学习, 深度确定性策略梯度, 在线航迹规划

Abstract:

In order to address the problem of high requirements for real-time performance and optimality of real-time path planning, a three-dimensional A^* algorithm is improved based on the reinforcement learning method. Firstly, the shrinkage factor is introduced to ameliorate the heuristic information weighting method of the improved cost function, so as to improve the time performance. Secondly, a measurement model is established to measure the real-time performance and optimality of the algorithm. Combined with the deterministic policy gradient method, the action-state and reward functions are designed to optimize the shrinkage factor. Finally, the improved three-dimensional A^* algorithm is simulated in multiple scenarios, and the simulation results show that the improved algorithm can ensure the optimality of the track results and effectively improve the time performance of the algorithm.

Key words: improved A^* algorithm, shrinkage factor, reinforcement learning, deep deterministic policy gradient, real-time path planning

中图分类号:

TJ765

任智, 张栋, 唐硕. 基于强化学习的改进三维A^*算法在线航迹规划[J]. 系统工程与电子技术, 2023, 45(1): 193-201.

Zhi REN, Dong ZHANG, Shuo TANG. Improved three-dimensional A^* algorithm of real-time path planning based on reinforcement learning[J]. Systems Engineering and Electronics, 2023, 45(1): 193-201.

图/表 21

图1

图2

图3

图4

图5

表1

表2

图6

图7

表3

表4

图8

图9

图10

图11

表5

表6

图12

图13

图14

图15

参考文献 31

1	PARK S O , MIN C L , KIM J . Trajectory planning with collision avoidance for redundant robots using Jacobian and artificial potential field-based real-time inverse kinematics[J]. International Journal of Control, Automation and Systems, 2020, 18 (8): 2095- 2107. doi: 10.1007/s12555-019-0076-7
2	JAMSHIDI V , NEKOUKAR V , REFAN M H . Analysis of parallel genetic algorithm and parallel particle swarm optimization algorithm UAV path planning on controller area network[J]. Journal of Control, Automation and Electrical Systems, 2019, 31 (1): 129- 140.
3	QU C Z , GAI W D , ZHONG M Y , et al. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning[J]. Applied Soft Computing, 2020, 89 (1): 106099.
4	RASHID R, PERUMAL N, ELAMVAZUTHI I, et al. Mobile robot path planning using ant colony optimization[C]//Proc. of the 2nd IEEE International Symposium on Robotics and Manufacturing Automation, 2016: 16657753.
5	XU Z , ZHANG E , CHEN Q W . Rotary unmanned aerial vehicles path planning in rough terrain based on multi-objective particle swarm optimization[J]. Journal of Systems Engineering and Electronics, 2020, 31 (1): 130- 141.
6	LAI Q , XU G H . A new path planning method of mobile robot based on adaptive dynamic firefly algorithm[J]. Modern Physics Letters B, 2020, 34 (29): 2050322. doi: 10.1142/S0217984920503224
7	SHANG E , DAI B , NIE Y M , et al. An improved A-Star based path planning algorithm for autonomous land vehicles[J]. International Journal of Advanced Robotic Systems, 2020, 17 (5)
8	ZHANG A , CHONG L , BI W H . Rectangle expansion A^* pathfinding for grid maps[J]. Chinese Journal of Aeronautics, 2016, 29 (5): 1385- 1396. doi: 10.1016/j.cja.2016.04.023
9	赵真明, 孟正大. 基于加权A^*算法的服务型机器人路径规划[J]. 华中科技大学学报(自然科学版), 2008, 36 (S1): 196- 198.
	ZHAO Z M , MENG Z D . Path planning of service robot based on weighted A^* algorithm[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2008, 36 (S1): 196- 198.
10	REN Y Y, SONG X R, GAO S. Research on path planning of mobile robot based on improved A^* in special environment[C]//Proc. of the 3rd IEEE International Symposium on Autonomous Systems, 2019: 12-16.
11	LIU S W, MA Y. Research for bidirectional path planning based on an improved A^* algorithm[C]//Proc. of the IEEE International Conference on Advances in Electrical Engineering and Computer Applications, 2020: 1036-1039.
12	SHANG E, DAI B, NIE Y M, et al. A guide-line and key-point based A-star path planning algorithm for autonomous land vehicles[C]//Proc. of the 23rd IEEE International Conference on Intelligent Transportation Systems, 2020.
13	王生印, 龙腾, 王祝, 等. 基于即时修复式稀疏A^*算法的动态航迹规划[J]. 系统工程与电子技术, 2018, 40 (12): 2714- 2721.
	WANG S Y , LONG T , WANG Z , et al. Dynamic path planning based on real-time repair sparse A^* algorithm[J]. Systems Engineering and Electronics, 2018, 40 (12): 2714- 2721.
14	王维, 裴东, 冯璋. 改进A^*算法的移动机器人最短路径规划[J]. 计算机应用, 2018, 38 (5): 1523- 1526.
	WANG W , PEI D , FENG Z . Shortest path planning for mobile robots based on improved A^* algorithm[J]. Journal of Computer Applications, 2018, 38 (5): 1523- 1526.
15	李晨溪, 曹雷, 张永亮, 等. 基于知识的深度强化学习研究综述[J]. 系统工程与电子技术, 2017, 39 (11): 2603- 2613.
	LI C X , CAO L , ZHANG Y L , et al. A review of knowledge based deep reinforcement learning[J]. Systems Engineering and Electronics, 2017, 39 (11): 2603- 2613.
16	CHEN H Y , JI Y , NIU L . Reinforcement learning path planning algorithm based on obstacle area expansion strategy[J]. Intelligent Service Robotics, 2020, 13 (6): 289- 297.
17	LIN X G, GUO R X. Path planning of unmanned surface vehicle based on improved Q-learning algorithm[C]//Proc. of the 3rd IEEE International Conference on Electronic Information Technology and Computer Engineering, 2019: 302-306.
18	LI Y B, ZHANG S T, YE F, et al. A UAV path planning method based on deep reinforcement learning[C]//Proc. of the IEEE USNC-CNC-URSI North American Radio Science Meeting, 2020: 93-94.
19	董培方, 张志安, 梅新虎, 等. 引入势场及陷阱搜索的强化学习路径规划算法[J]. 计算机工程与应用, 2018, 54 (16): 129- 134.
	DONG P F , ZHANG Z A , MEI X H , et al. Reinforcement learning path planning algorithm based on gravitational potential field and trap search[J]. Computer Engineering and Applications, 2018, 54 (16): 129- 134.
20	ZHENG S F , LIU H . Improved multi-agent deep deterministic policy gradient for path planning-based crowd simulation[J]. IEEE Access, 2019, 7, 147755- 147770.
21	GAO J L , YE W J , GUO J , et al. Deep reinforcement learning for indoor mobile robot path planning[J]. Sensors, 2020, 20 (19): 5493.
22	GAO X , FANG Y W , WU Y L . Fuzzy Q learning algorithm for dual-aircraft path planning to cooperatively detect targets by passive radars[J]. Journal of Systems Engineering and Electronics, 2013, 24 (5): 800- 810.
23	LI B H , WU Y J . Path planning for UAV ground target tracking via deep reinforcement learning[J]. IEEE Access, 2020, 8, 29064- 29074.
24	CHEN Y, HU J L, HIRASAWA K, et al. Optimizing reserve size in genetic algorithms with reserve selection using reinforcement learning[C]//Proc. of the IEEE SICE Annual Conference, 2007: 1341-1347.
25	ADARSG S, HUNG L, SUSHIL L, et al. Deep reinforcement learning using genetic algorithm for parameter optimization[C]//Proc. of the 3rd IEEE International Conference on Robotic Computing, 2019: 596-601.
26	SYED I A M, MOINUL I, MD M U. Q-learning based particle swarm optimization algorithm for optimal path planning of swarm of mobile robots[C]//Proc. of the 1st IEEE International Conference on Advances in Science, Engineering and Robo-tics Technology, 2019.
27	封硕, 郑宝娟, 陈文兴, 等. 支持强化学习RNSGA-Ⅱ算法在航迹规划中应用[J]. 计算机工程与应用, 2020, 56 (3): 246- 251.
	FENG S , ZHENG B J , CHEN W X , et al. Application of reinforcement learning RNSGA- Ⅱ algorithm in flight path planning[J]. Computer Engineering and Applications, 2020, 56 (3): 246- 251.
28	曾国奇, 赵民强, 刘方圆, 等. 基于网格PRM的无人机多约束航路规划[J]. 系统工程与电子技术, 2016, 38 (10): 2310- 2316.
	ZENG G Q , ZHAO M Q , LIU F Y , et al. Multi-constraints UAV path planning based on grid PRM[J]. Systems Engineering and Electronics, 2016, 38 (10): 2310- 2316.
29	WU X L , XU L , ZHEN R , et al. Bi-directional adaptive A^* algorithm toward optimal path planning for large-scale UAV under multi-constraints[J]. IEEE Access, 2020, 8, 85431- 85440.
30	XU Z Y, TANG J, MENG J S, et al. Experience-driven networking: a deep reinforcement learning based approach[C]//Proc. of the IEEE Conference on Computer Communications, 2018: 1871-1879.
31	XU Z, LIU X, CHEN Q L. Application of improved Astar algorithm in global path planning of unmanned vehicles[C]//Proc. of the IEEE Chinese Automation Congress, 2019: 2075-2080.

序号	性能	性能参数
1	飞行速度/(m/s)	200
2	安全飞行高度/m	500
3	最大俯仰角/(°)	20
4	最大航向角/(°)	45
5	最小转弯半径/m	1 242.6

序号	经纬坐标/(°)	威胁半径/km
1	(112.4, 38.6)	5
2	(112.7, 38.4)	5
3	(112.5, 38.44)	5
4	(112.6, 38.25)	5
5	(112.4, 38.4)	5
6	(112.2, 38.5)	5

序号	性能	性能参数
1	t_i^*/s	264.13
2	t_imax/s	0.932
3	e_i^*	49
4	e_imax	53

序号	性能	性能参数
1	Actor网络学习率	0.001
2	Critic网络学习率	0.002
3	Batch训练样本大小	128
4	经验回放池大小	10 000
5	软策略更新因子	0.01
6	收缩因子初值	0.4
7	即时回报收敛因子	0.9

场景	起始节点	目标节点	禁飞区1	禁飞区2	禁飞区3	禁飞区4	禁飞区5	禁飞区6
1	(112.9, 38.2)	(112.1, 38.8)	(112.4, 38.6)	(112.7, 38.4)	(112.5, 38.44)	(112.6, 38.25)	(112.4, 38.4)	(112.2, 38.5)
2	(112.9, 38.7)	(112.1, 38.2)	(112.4, 38.6)	(112.7, 38.4)	(112.5, 38.44)	(112.6, 38.25)	(112.4, 38.4)	(112.2, 38.5)
3	(112.9, 38.5)	(112.1, 38.4)	(112.4, 38.6)	(112.7, 38.4)	(112.5, 38.44)	(112.6, 38.25)	(112.4, 38.4)	(112.2, 38.5)
4	(112.9, 38.2)	(112.1, 38.8)	(112.7, 38.3)	(112.4, 38.6)	(112.4, 38.3)	(112.2, 38.4)	(112.6, 38.3)	(112.6, 38.4)
5	(112.9, 38.7)	(112.1, 38.2)	(112.7, 38.3)	(112.4, 38.6)	(112.4, 38.3)	(112.2, 38.4)	(112.6, 38.3)	(112.6, 38.4)
6	(112.9, 38.5)	(112.1, 38.4)	(112.7, 38.3)	(112.4, 38.6)	(112.4, 38.3)	(112.2, 38.4)	(112.6, 38.3)	(112.6, 38.4)