Systems Engineering and Electronics ›› 2025, Vol. 47 ›› Issue (2): 535-543.doi: 10.12305/j.issn.1001-506X.2025.02.20

• Systems Engineering • Previous Articles    

Uncertainty-based Bayesian policy reuse method

Ke FU, Hao CHEN, Yu WANG, Quan LIU, Jian HUANG   

  1. College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
  • Received:2023-09-05 Online:2025-02-25 Published:2025-03-18
  • Contact: Jian HUANG

Abstract:

To solve the non-stationarity problem caused by opponent policy changes in multi-agent competitions, this paper proposes an algorithm called uncertainty-based Bayesian policy reuse under the restriction of unavailability of the online opponent's actions. In the offline phase, use an autoencoder to model the relationship representation between agent trajectories and the opponent actions during policy learning. In the online phase, the agent evaluates the uncertainty of the opponent type only conditioning on limited interaction information and the built opponent models. Afterward, optimal response policy is selected for execution. The proposed algorithm on two scenarios and demonstrate that it has higher recognition accuracy and faster speed than three state-of-the-art baseline methods.

Key words: multi-agent competition, Bayesian policy reuse, reinforcement learning, relationship representation

CLC Number: 

[an error occurred while processing this directive]