系统工程与电子技术 ›› 2018, Vol. 40 ›› Issue (1): 217-224.

• 软件、算法与仿真 • 上一篇    下一篇

基于Nash-Q的网络信息体系对抗仿真技术

闫雪飞, 李新明, 刘东, 王寿彪   

  1. 装备学院复杂电子系统仿真实验室, 北京 101416
  • 出版日期:2018-01-08 发布日期:2018-01-08

Confrontation simulation for network information system-of-systems based on Nash-Q

YAN Xuefei, LI Xinming, LIU Dong, WANG Shoubiao   

  1. Science and Technology on Complex Electronic System Simulation Laboratory, Equipment Academy, Beijing 101416, China
  • Online:2018-01-08 Published:2018-01-08

摘要:

武器装备体系作战仿真研究隶属于复杂系统研究范畴,首次对基于Nash-Q的网络信息体系(network information system-of-systems,NISoS)对抗认知决策行为进行探索研究。Nash-Q算法与联合Qlearning算法具有类似的形式,其区别在于联合策略的计算,对于零和博弈体系作战模型,由于Nash-Q不需要其他Agent的历史信息即可通过Nash均衡的求解而获得混合策略,因此更易于实现也更加高效。建立了战役层次零和作战动态博弈模型,在不需要其他Agent的完全信息时,给出了Nash均衡的求解方法。此外,采用高斯径向基神经网络对Q表进行离散,使得算法具有更好的离散效果以及泛化能力。最后,通过NISoS作战仿真实验验证了算法的有效性以及相比基于Q-learning算法以及Rule-based决策算法具有更高的收益,并且在离线决策中表现优异。

Abstract:

Battle simulation for weapon equipment sysem-of-systems (SoS) belongs to the research category of complex system and the confrontation cognition of network information system-of-systems (NISoS) based on Nash-Q technology is researched. The form of the Nash-Q is similar with the union Q-learning except the obtaining of the union policy. For the zero-sum game model of the SoS battle simulation, the realization and solution of the Nash-Q model is more effective since the Nash-Q does not need the history action messages of other Agents. The zero-sum game command model for the battle simulation of the tactical command level is built and the solving process of Nash-equilibrium is introduced through the complete information of other Agents is not known. The Gauss radial basis function neural network is used to discrete the Q-table to improve the discrete performance and generalization ability of Nash-Q. Finally, the effectiveness of the algorithm is validated through battle simulation of NISoS. Compared with Q-learning and Rule-based algorithm, the proposed algorithm has higher gains and can be used to off-line decision.