Journal of Systems Engineering and Electronics ›› 2010, Vol. 32 ›› Issue (5): 1043-1046.doi: 10.3969/j.issn.1001-506X.2010.05.035

• 制导、导航与控制 • 上一篇    下一篇

一种基于信息熵的强化学习算法

赵昀, 陈庆伟, 胡维礼   

  1. (南京理工大学自动化学院, 江苏 南京 210094)
  • 出版日期:2010-05-24 发布日期:2010-01-03

Reinforcement learning algorithm based on information entropy

ZHAO Yun,  CHEN Qing-wei,  HU Wei-li   

  1. (School of Automation, Nanjing Univ. of Science and Technology, Nanjing 210094, China)
  • Online:2010-05-24 Published:2010-01-03

摘要:

针对强化学习中探索和利用之间的平衡控制问题,提出了一种基于信息熵的强化学习算法。该算法利用信息熵的概念,定义了一种新的状态重要性测度,度量了状态与目标之间的关联程度,据此设计了一种探索机制,用于自适应调节学习过程中探索和利用之间的平衡;通过设置可变测度阈值的方法,对状态空间进行自主删减,最终生成合适的、规模较小的状态空间,从而大大节约了计算资源,提高了学习速度。仿真结果表明,所提算法具有较好的学习性能。

Abstract:

To control the balance between exploration and exploitation, a reinforcement learning algorithm based on information entropy is proposed. A new state importance measure is defined from information entropy and is applied to measure the interrelatedness between state and objectives. Based on this new measure, an exploration mechanism is designed for adjusting the balance between exploration and exploitation adaptively. In addition, an autonomic reduction method is obtained by setting the variable threshold of measure, the size of state space can gradually reduce to a small and adapt space, which will save computing resource and accelerate learning speed. Simulation results indicate the good learning performance of the presented reinforcement learning algorithm.