基于模糊强化学习的双轮机器人姿态平衡控制

doi:10.12305/j.issn.1001-506X.2021.04.21

系统工程与电子技术 ›› 2021, Vol. 43 ›› Issue (4): 1036-1043.doi: 10.12305/j.issn.1001-506X.2021.04.21

基于模糊强化学习的双轮机器人姿态平衡控制

闫安¹(), 陈章^2,*(), 董朝阳¹(), 何康辉¹()

1. 北京航空航天大学航空科学与工程学院, 北京 100191
2. 清华大学自动化系, 北京 100084

收稿日期:2020-06-13 出版日期:2021-03-25 发布日期:2021-03-31
通讯作者: 陈章 E-mail:yanan801@buaa.edu.cn;cz_da@tsinghua.edu.cn;dongchaoyang@buaa.edu.cn;502711921@qq.com
作者简介:闫安 (1996-), 男, 硕士研究生, 主要研究方向为智能控制应用、飞行器的导航与控制。E-mail: yanan801@buaa.edu.cn|陈章 (1984-), 男, 助理研究员, 博士, 主要研究方向为机器人控制。E-mail: cz_da@tsinghua.edu.cn|董朝阳 (1965-), 男, 教授, 博士, 主要研究方向为飞行器控制、仿真和建模、智能控制应用。E-mail: dongchaoyang@buaa.edu.cn|何康辉 (1996-), 男, 硕士研究生, 主要研究方向为飞行器的导航与控制。E-mail: 502711921@qq.com
基金资助:
国家自然科学基金(61833016);国家自然科学基金(61873295);航空人工智能专项基金(2018ZA51003)

Attitude balance control of two-wheeled robot based on fuzzy reinforcement learning

An YAN¹(), Zhang CHEN^2,*(), Chaoyang DONG¹(), Kanghui HE¹()

1. School of Aeronautic Science and Engineering, Beihang University, Beijing 100191, China
2. Department of Automation, Tsinghua University, Beijing 100084, China

Received:2020-06-13 Online:2021-03-25 Published:2021-03-31
Contact: Zhang CHEN E-mail:yanan801@buaa.edu.cn;cz_da@tsinghua.edu.cn;dongchaoyang@buaa.edu.cn;502711921@qq.com

摘要/Abstract

摘要：

针对单轨双轮机器人在静止情况下存在的固有静态不稳定问题, 提出一种基于模糊强化学习(简称为Fuzzy-Q)的控制方法。首先，运用拉格朗日法建立带控制力矩陀螺的系统动力学模型。然后, 在此基础上设计表格型强化学习算法, 实现机器人的稳定平衡控制。最后，针对算法存在的控制精度不高和控制器输出离散等问题, 采用模糊理论泛化动作空间, 改善控制精度, 并使控制输出连续。仿真实验表明, 相较于传统强化学习方法, 所提方法能够显著提高控制精度, 且可以有效抑制外界干扰力矩对系统的影响, 保证系统具有一定的抗干扰能力。

关键词: 强化学习, 模糊强化学习, 模糊算法, 控制力矩陀螺, 单轨双轮机器人

Abstract:

In order to solve the inherent problem of static instability of monorail two-wheel robot under resting conditions, a control method of monorail two-wheel robot based on fuzzy reinforcement learning (Fuzzy-Q in short) is proposed.Firstly, the Lagrange method is used to establish the system dynamics model with control moment gyro. And then, on this basis, the tabular reinforcement learning algorithm is designed to realize the stable balance control of the robot. Finally, In order to solve the problems of low control accuracy and discretization of controller output, the fuzzy theory is used to generalize the action space, improve the control accuracy and make the control output continuous. The simulation results show that compared with the traditional reinforcement learning methods, the proposed Fuzzy-Q method can significantly improve the control accuracy, effectively inhibit the influence of external interference torque on the system, and ensure that the system has a great anti-interference capability.

Key words: reinforcement learning, fuzzy reinforcement learning, fuzzy algorithm, control moment gyro, monorail two-wheeled robot

中图分类号:

TP242

闫安, 陈章, 董朝阳, 何康辉. 基于模糊强化学习的双轮机器人姿态平衡控制[J]. 系统工程与电子技术, 2021, 43(4): 1036-1043.

An YAN, Zhang CHEN, Chaoyang DONG, Kanghui HE. Attitude balance control of two-wheeled robot based on fuzzy reinforcement learning[J]. Systems Engineering and Electronics, 2021, 43(4): 1036-1043.

图/表 15

图1

图2

图3

表1

表2

表3

图4

图5

图6

表4

图7

图8

图9

图10

图11

参考文献 33

1	MENG J, LIU A B, YANG Y Q, et al. Two-wheeled robot platform based on pid control[C]//Proc. of the International Conference on Information Science and Control Engineering, 2018: 1011-1014.
2	WARDANA A A , TAKAKI T , AOYAMA T , et al. Dynamic modeling and step-climbing analysis of a two-wheeled stair-climbing inverted pendulum robot[J]. Advanced Robotics, 2020, 34 (5): 313- 327. doi: 10.1080/01691864.2019.1704868
3	UDDIN N , TEGUH A N , WAHYU A P . Passivity-based control for two-wheeled robot stabilization[J]. Journal of Physics: Conference Series, 2018, 1007 (1): 1- 6.
4	宁一高, 岳明, 许媛, 等. 基于IMU/UWB的两轮自平衡车轨迹跟踪控制器设计与实现[J]. 控制与决策, 2019, 34 (12): 2635- 2641.
	NING Y G , YUE M , XU Y , et al. Design and implementation of tra-jectory tracking controller for two-wheel self-balancing vehicle based on IMU/UWB[J]. Control and Decision, 2019, 34 (12): 2635- 2641.
5	ZHANG Y Z, WANG P C, YI J G, et al. Stationary balance control of a bikebot[C]//Proc. of the IEEE International Confe-rence on Robotics and Automation, 2014: 6706-6711.
6	KEO L, YOSHINO K, KAWAGUCHI M, et al. Experimental results for stabilizing of a bicycle with a flywheel balancer[C]//Proc. of the International Conference on Robotics and Automation, 2011: 6150-6155.
7	LAM P Y , SIN T K . Gyroscopic stabilization of a self-balancing robt bicycle[J]. Automation Twchnology, 2011, 5 (6): 916- 923.
8	HE J, ZHAO M G. Control system design of self-balanced bicycles by control moment gyroscope[C]//Proc. of the China Intelligent Automation Academic Conference, 2015: 206-215.
9	HSIEH M H, CHEN Y T, CHI C H, et al. Fuzzy sliding mode control of a riderless bicycle with a gyroscopic balancer[C]//Proc. of the International Symposium on Robotic and Sensors Environments, 2014: 13-18.
10	JIAN F , HE T Y . The LQR controller design of two-wheeled self-balancing robot based on the particle swarm optimization algorithm[J]. Mathematical Problems in Engineering, 2014, 12, 1- 6.
11	李润泽, 张宇飞, 陈海昕. 针对超临界翼型气动修型策略的强化学习研究[J]. 航空学报, 2020, 41 (10): 1- 18.
	LI R Z , ZHANG Y F , CHEN H X . Study on reinforcement learning of aerodynamic modification strategy for supercritical airfoil[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41 (10): 1- 18.
12	MUKHOPADHYAY S, TILAK O, CHAKRABARTI S. Reinforcement learning algorithms for uncertain, dynamic, zero-sum games[C]//Proc. of the International Conference on Machine Learning and Applications, 2018: 48-54.
13	邢强, 贾鑫, 朱卫. 基于Q-学习的智能雷达对抗[J]. 系统工程与电子技术, 2018, 40 (5): 1031- 1035.
	XING Q , JIA X , ZHU W . Intelligent radar countermeasures based on Q-learning[J]. Systems Engineering and Electronics, 2018, 40 (5): 1031- 1035.
14	张晓路, 李斌, 常健, 等. 水下滑翔蛇形机器人滑翔控制的强化学习方法[J]. 机器人, 2019, 41 (3): 334- 342.
	ZHANG X L , LI B , CHANG J , et al. Reinforcement learning method for gliding control of underwater gliding snake robot[J]. Robot, 2019, 41 (3): 334- 342.
15	SUTTON R S , MCALLESTER D , SINGH S , et al. Policy gradient methods for reinforcement learning with function approximation[J]. Advances in Neural Information Processing Systems, 2000, 12, 1057- 1063.
16	蒋国飞, 吴沧浦. 基于Q学习算法和BP神经网络的倒立摆控制[J]. 自动化学报, 1998, 24 (5): 662- 666.
	JIANG G F , WU C P . Inverted pendulum control based on Q-learning algorithm and BP neural network[J]. Acta Automatica Sinica, 1998, 24 (5): 662- 666.
17	WANG Y , LIU Y T , CHEN W , et al. Target transfer Q-learning and its convergence analysis[J]. Neurocomputing, 2020, 392, 11- 22. doi: 10.1016/j.neucom.2020.02.117
18	BAIRD L C, KLOPF A H. Reinforcement learning with high-dimensional, continuous actions[EB/OL]. [2020-06-01]. https://xueshu.baidu.com/usercenter/paper/show?paperid=ce2d12b15ed2f32cbd8655240a6aca67&site=xueshu_se.
19	王舒, 郑世强. 基于复合控制的磁悬浮CMG动框架效应抑制[J]. 北京航空航天大学学报, 2020, 46 (12): 2339- 2347.
	WANG S , ZHENG S Q . Inhibition of dynamic frame effect of magnetic levitation CMG based on composite control[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46 (12): 2339- 2347.
20	ZENG W J , PAN S , CHEN L , et al. Research on ultra-low speed driving method of traveling wave ultrasonic motor for CMG[J]. Ultrasonics, 2020, 103, 106088. doi: 10.1016/j.ultras.2020.106088
21	贾英宏, 赵楠, 徐世杰. 控制力矩陀螺驱动的空间机器人轨迹跟踪控制[J]. 北京航空航天大学学报, 2014, 40 (3): 285- 291.
	JIA Y H , ZHAO N , XU S J . Control the trajectory tracking control of space robot driven by torque gyro[J]. Journal of Beijing University of Aeronautics and Astronautics, 2014, 40 (3): 285- 291.
22	郭磊, 黄用华, 廖启征, 等. 自平衡自行车机器人的运动学分析[J]. 北京邮电大学学报, 2011, 34 (6): 99- 102. doi: 10.3969/j.issn.1007-5321.2011.06.023
	GUO L , HUANG Y H , LIAO Q Z , et al. Kinematics analysis of self-balancing bicycle robot[J]. Journal of Beijing University of Posts and Telecommunications, 2011, 34 (6): 99- 102. doi: 10.3969/j.issn.1007-5321.2011.06.023
23	王囡囡, 熊佳铭, 刘才山. 自行车动力学建模及稳定性分析研究综述[J]. 力学学报, 2020, 52 (4): 917- 927.
	WANG N N , XIONG J M , LIU C S . A review of bicycle dynamics modeling and stability analysis[J]. Chinese Journal of Theoretical and Applied Mechanics, 2020, 52 (4): 917- 927.
24	GETZ N H, JERROLD E M. Dynamic inversion of nonlinear maps with applications to nonlinear control and robotics[D]. Berkeley: University of California, 1995.
25	GUO L, LIAO Q Z, WEI S M, et al. A kind of bicycle robot dynamic modeling and nonlinear control[C]//Proc. of the International Conference on Information and Automation, 2010: 1613-1617.
26	KEO L, YAMAKITA M. Controlling balancer and steering for bicycle stabilization[C]//Proc. of the Intelligent Robots and Systems, 2009: 4541-4546.
27	WATKINS C J C H , DAYAN P . Q-learning[J]. Machine Learning, 1992, 8 (3): 279- 292.
28	SCHILPEROORT J, MAK I, DRUGAN M M, et al. Learning to play pac-xon with Q-learning and two double Q-learning variants[C]//Proc. of the Symposium Series on Computational Intelligence, 2018: 1151-1158.
29	DAS P K , BEHERA H S , PANIGRAHI B K . Intelligent-based multi-robot path planning inspired by improved classical Q-learning and improved particle swarm optimization with perturbed velocity[J]. Engineering Science and Technoloy, 2016, 19 (1): 651- 669.
30	SCHILPEROORT J, MAK I, DRUGAN M M, et al. Wiering learning to play pac-xon with Q-learning and two double Q-learning variants[C]//Proc. of the Symposium Series on Computational Intelligence, 2018: 1151-1158.
31	SUN C Y. Q-Learning: fundamental Q-learning algorithm in finding optimal policy[C]//Proc. of the International Conference on Smart Grid and Electrical Automation, 2017: 243-246.
32	LI X X , PENG Z H , JIAO L , et al. Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games[J]. Science China (Information Sciences), 2019, 62 (12): 164- 177.
33	ZHANG W Z , LYU T S . Reactive fuzzy controller design by Q-learning for mobile robot navigation[J]. Journal of Harbin Institute of Technology, 2005, 3, 319- 324.

编号	描述	范围/rad
1	左偏很大	(-∞, -0.3)
2	左偏较大	[-0.3, -0.1)
3	左偏大	[-0.1, -0.05)
4	左偏较小	[-0.05, -0.02)
5	左偏很小	[-0.02, 0)
6	右偏很小	[0, 0.02]
7	右偏较小	(0.02, 0.05]
8	右偏大	(0.05, 0.1]
9	右偏较大	(0.1, 0.3]
10	右偏很大	(0.3, +∞)

编号	描述	范围/(rad/s)
1	向左很大	(-∞, -0.9)
2	向左较大	[-0.9, -0.5)
3	向左大	[-0.1, -0.5)
4	向左较小	[-0.1, -0.05)
5	向左很小	[-0.05, 0)
6	向右很小	[0, 0.05]
7	向右较小	(0.05, 0.1]
8	向右大	(0.1, 0.5]
9	向右较大	(0.5, 0.9]
10	向右很大	(0.9, +∞)

编号	描述	取值/(N·m)
1	向左较大	-10
2	向左较小	-1
3	无偏	0
4	向右较小	1
5	向右较大	10

参数	符号	取值
飞轮质量/kg	m_f	15.4
飞轮X轴转动惯量/(kg·m²)	I_fx	0.045 7
飞轮Y轴转动惯量/(kg·m²)	I_fy	0.045 7
飞轮Z轴转动惯量/(kg·m²)	I_fz	0.085
陀螺框架主轴惯量/(kg·m²)	—	0
车身主轴惯量/(kg·m²)	I_{b_y}	14.56
车身质心高度/m	h_b	0.35
飞轮质心高度/m	h_f	0.3
陀螺框架质心高度/m	h_g	0
车身质量/kg	m_b	89.2
飞轮自转角速度/rpm	Ω	4 000
陀螺框架质量/kg	m_g	0
重力加速度/(N/kg)	g	9.8

[1]	周文明, 崔德康, 周婧怡, 张明明, 朱安石. 储供基地支援保障能力评估混合算法[J]. 系统工程与电子技术, 2022, 44(9): 2832-2839.
[2]	朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于马尔可夫的多功能雷达认知干扰决策建模研究[J]. 系统工程与电子技术, 2022, 44(8): 2488-2497.
[3]	王冠, 茹海忠, 张大力, 马广程, 夏红伟. 弹性高超声速飞行器智能控制系统设计[J]. 系统工程与电子技术, 2022, 44(7): 2276-2285.
[4]	孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318.
[5]	郭冬子, 黄荣, 许河川, 孙立伟, 崔乃刚. 再入飞行器深度确定性策略梯度制导方法研究[J]. 系统工程与电子技术, 2022, 44(6): 1942-1949.
[6]	韩明仁, 王玉峰. 基于强化学习的全电推进卫星变轨优化方法[J]. 系统工程与电子技术, 2022, 44(5): 1652-1661.
[7]	何立, 沈亮, 李辉, 王壮, 唐文泉. 强化学习中的策略重用: 研究进展[J]. 系统工程与电子技术, 2022, 44(3): 884-899.
[8]	朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于先验知识的多功能雷达智能干扰决策方法[J]. 系统工程与电子技术, 2022, 44(12): 3685-3695.
[9]	杨清清, 高盈盈, 郭玙, 夏博远, 杨克巍. 基于深度强化学习的海战场目标搜寻路径规划[J]. 系统工程与电子技术, 2022, 44(11): 3486-3495.
[10]	曾斌, 张鸿强, 李厚朴. 针对无人潜航器的反潜策略研究[J]. 系统工程与电子技术, 2022, 44(10): 3174-3181.
[11]	万齐天, 卢宝刚, 赵雅心, 温求遒. 基于深度强化学习的驾驶仪参数快速整定方法[J]. 系统工程与电子技术, 2022, 44(10): 3190-3199.
[12]	曾斌, 王睿, 李厚朴, 樊旭. 基于强化学习的战时保障力量调度策略研究[J]. 系统工程与电子技术, 2022, 44(1): 199-208.
[13]	江志炜, 黄洋, 吴启晖. 基于核函数强化学习的抗干扰频点分配[J]. 系统工程与电子技术, 2021, 43(6): 1547-1556.
[14]	刘家义, 岳韶华, 王刚, 姚小强, 张杰. 复杂任务下的多智能体协同进化算法[J]. 系统工程与电子技术, 2021, 43(4): 991-1002.
[15]	李琛, 黄炎焱, 张永亮, 陈天德. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J]. 系统工程与电子技术, 2021, 43(3): 755-762.

基于模糊强化学习的双轮机器人姿态平衡控制

Attitude balance control of two-wheeled robot based on fuzzy reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 33

相关文章 15

编辑推荐

Metrics

本文评价