系统工程与电子技术 ›› 2019, Vol. 41 ›› Issue (7): 1652-1657.doi: 10.3969/j.issn.1001-506X.2019.07.29

• 通信与网络 • 上一篇    下一篇

LTE-V下基于深度强化学习的基站选择算法

谢浩1, 郭爱煌1,2, 宋春林1, 焦润泽1   

  1. 1. 同济大学电子与信息工程学院, 上海 201804;
    2. 东南大学毫米波国家重点实验室, 江苏 南京 201804
  • 出版日期:2019-06-28 发布日期:2019-07-09

eNB selection for LTE-V using deep reinforcement learning

XIE Hao1, GUO Aihuang1,2, SONG Chunlin1, JIAO Runze1   

  1. 1. School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China;
    2. State Key Laboratory of Millimeter Waves, Southeast University, Nanjing 210092, China
  • Online:2019-06-28 Published:2019-07-09

摘要: 针对长期演进车辆(long term evolution-vehicle, LTE-V)下的车辆随机竞争接入网络容易造成网络拥塞的问题,提出基于深度强化学习(deep reinforcement learning,DRL)为LTE-V下的车辆接入最佳基站(evolved node B,eNB)的选择算法。使用LTE核心网中移动管理单元(mobility management entity,MME)作为代理,同时考虑网络侧负载与接收端接收速率,完成车辆与eNB的匹配问题,降低网络拥塞概率,减少网络时延。使用竞争双重深度Q网络(dueling-double deep Q-network,D-DDQN)来拟合目标动作-估值函数(action-value function,AVF),完成高维状态输入低维动作输出的转化。仿真表明,D-DDQN训练完成参数收敛后,LTE-V网络拥塞概率大幅下降,整体性能有较大提升。

关键词: 长期演进车辆, 深度强化学习, 基站选择, 拥塞概率, 网络负载均衡

Abstract: The source allocation scheme for long term evolution-vehicle (LTE-V) is based on random selection, which will cause serious network congestion easily. Based on deep reinforcement learning (DRL), an best access evolved node B (eNB) selection algorithm for the vehicle type communication under LTE-V network is proposed. In order to reduce both the blocking probability and communication delays of LTE-V network, the mobility management entity (MME) is used as an agent, also the receiving rate at user side and network loading at network side are taking into consideration. Meanwhile, dueling-double deep Q-network (D-DDQN) is adopt to fit the target action-value function (AVF). D-DDQN can convert the high dimension state inputs to the low dimension action outputs. The simulation shows that the blocking probability of LTE-V network is reduced significantly after the convergence of DQN’s parameters and the properties of the entire network is improved greatly.

Key words: long term evolution-vehicle (LTE-V), deep reinforcement learning (DRL), evolved node B (eNB) selection, network blocking probability, load balance