系统工程与电子技术 ›› 2018, Vol. 40 ›› Issue (3): 518-525.doi: 10.3969/j.issn.1001-506X.2018.03.05

• 电子技术 • 上一篇    下一篇

基于正强化学习和正交分解的干扰策略选择算法

颛孙少帅1,2, 杨俊安1,2, 刘辉1,2, 黄科举1,2   

  1. 1. 国防科技大学电子对抗学院, 安徽 合肥 230037; 2. 安徽省电子制约技术重点实验室, 安徽 合肥 230037
  • 出版日期:2018-02-26 发布日期:2018-02-24

Jamming strategy learning based on positive reinforcement learning and orthogonal decomposition

ZHUANSUN Shaoshuai1,2, YANG Junan1,2, LIU Hui1,2, HUANG Keju1,2   

  1. 1. College of Electronic Countermeasure, National University of Defense Technology, Hefei 230037, China; 2. Key Laboratory of Electronic Restriction, Hefei 230037, China
  • Online:2018-02-26 Published:2018-02-24

摘要:

强化学习作为自学习和在线学习方法,以试错的方式与动态环境进行持续交互,进而学习到最优策略,成为机器学习领域一个重要的分支。针对当前无线通信干扰策略研究依赖先验信息以及学习速度过慢的缺点,提出了基于正强化学习-正交分解的干扰策略选择算法。该算法利用正强化的思想提高了最优动作被选中的概率,进而加快了系统的学习速度。特别地,当通信信号星座图因诸多因素而产生畸变时,利用提出的正交分解算法能够学习到最佳干扰信号的同相分量和正交分量,即通过学习获得最佳干扰样式。仿真结果表明,利用正强化学习-正交分解算法能够更加快速地学习到最优干扰参数和最佳干扰样式,相同任务中,仅需更少的交互次数且干扰效果更好,较现有干扰策略选择算法更优。

Abstract:

As a self-improving and online learning method, reinforcement learning gets an optimal strategy through trial-and-error by interacting with the dynamic environment continually, and has become an important subfield of machine learning. Current study of the jamming strategy needs priori information or has a slow learning speed, an algorithm of learning jamming strategy based on positive reinforcement learning-orthogonal decomposition (PRL-OD) is proposed. This algorithm increases the possibility of choosing the optimal action by positive reinforcement learning, and then accelerates the learning speed which is also known as the convergence rate. Specially, when signal constellation is distorted by many factors, such as channel noise or imbalance between in-phase and quadrature channels, the proposed signaling scheme orthogonal decomposition method could learn the optimal in-phase component and quadrature component to form the optimal jamming signal. Numerous results show that the proposed PRL-OD algorithm could learn better jamming parameters and faster modulation patterns than state-of-art algorithms. In the same jamming tasks, the PRL-OD algorithm only needs fewer interactions and achieves higher jamming performance.