Systems Engineering and Electronics ›› 2022, Vol. 44 ›› Issue (3): 884-899.doi: 10.12305/j.issn.1001-506X.2022.03.21

• Systems Engineering • Previous Articles     Next Articles

Survey on policy reuse in reinforcement learning

Li HE1, Liang SHEN3, Hui LI1,2,*, Zhuang WANG1, Wenquan TANG1   

  1. 1. School of Computer Science (Software), Sichuan University, Chengdu 610065, China
    2. National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China
    3. Jiangxi Hongdu Aviation Industry Group Company Limited, Nanchang 330024, China
  • Received:2021-01-18 Online:2022-03-01 Published:2022-03-10
  • Contact: Hui LI

Abstract:

Policy reuse (PR) is a transfer learning (TL) method. By using the internal connection among tasks, the experience and knowledge learned in the past can be used to accelerate the learning of the current target task. To a large extent, it solves the problems of traditional reinforcement learning (RL), such as slow convergence speed and high resource consumption, and avoids the problem of difficult reuse on similar problems. This paper reviews PR methods in RL, subdivided as policy reconstruction, reward shaping, problem transformation and similarity measurement, presents their characteristics respectively, and introduces their extensions in multi-agent scenarios and deep RL (DRL). Then, the mapping methods between source and target tasks are introduced. Finally, based on the current application of PR, some conjectures and assumptions about the future development direction of this subject are described.

Key words: reinforcement learning (RL), transfer learning (TL), policy reuse (PR), task mapping

CLC Number: 

[an error occurred while processing this directive]