系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (6): 1917-1929.doi: 10.12305/j.issn.1001-506X.2025.06.20

• 系统工程 • 上一篇    下一篇

面向卫星在轨处理的强化学习任务调度算法

孟麟芝1,2,3, 孙小涓1,2,3,*, 胡玉新1,2,3, 高斌1,2, 孙国庆1,2, 牟文浩1,2   

  1. 1. 中国科学院空天信息创新研究院, 北京 100190
    2. 中国科学院空间信息处理与应用系统技术重点实验室, 北京 100190
    3. 中国科学院大学电子电气与通信工程学院, 北京 100049
  • 收稿日期:2024-06-26 出版日期:2025-06-25 发布日期:2025-07-09
  • 通讯作者: 孙小涓
  • 作者简介:孟麟芝(2000—), 男, 硕士研究生, 主要研究方向为大数据与云计算
    孙小涓(1980—), 女, 研究员, 博士, 主要研究方向为空间信息处理、高性能计算
    胡玉新(1981—), 男, 研究员, 博士, 主要研究方向为空间信息处理系统
    高斌(1990—), 男, 助理研究员, 硕士, 主要研究方向为卫星地面应用系统
    孙国庆(1992—), 男, 助理研究员, 硕士, 主要研究方向为卫星地面应用系统
    牟文浩(1995—), 男, 工程师, 硕士, 主要研究方向为卫星地面应用系统

Reinforcement learning task scheduling algorithm for satellite on-orbit processing

Linzhi MENG1,2,3, Xiaojuan SUN1,2,3,*, Yuxin HU1,2,3, Bin GAO1,2, Guoqing SUN1,2, Wenhao MU1,2   

  1. 1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China
    2. Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Beijing 100190, China
    3. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2024-06-26 Online:2025-06-25 Published:2025-07-09
  • Contact: Xiaojuan SUN

摘要:

随着卫星对地观测进入多卫星、高分辨率、实时响应、全球观测的时代, 卫星在轨数据处理已成为提高遥感数据处理实时性的主流手段之一。在卫星资源受限、数传链路信道资源受限、随遇观测任务具有不可预测性的场景下, 进行数据处理任务实时调度具有较大挑战。首先,构建以最大化系统平均数据处理吞吐率为目标的优化问题模型。然后,提出一种在线的结合深度强化学习(deep reinforcement learning, DRL)的任务调度算法, 采用DRL算法能够实时计算任务调度策略, 选取拉格朗日对偶优化算法能够准确计算最优资源分配量。最后,通过仿真实验对算法有效性和数据处理吞吐率进行评价, 结果表明算法能够收敛并接近最优解, 相比于已有算法将数据处理吞吐率提高了约8%, 且在卫星数据到达速率及卫星计算节点数量增大时具有一定扩展性。所提算法能够在最大化系统平均数据处理吞吐率的同时, 保障高动态环境下任务队列长度及平均能耗稳定收敛。

关键词: 卫星在轨处理, 任务调度, 资源分配, 深度强化学习, 李雅普诺夫优化

Abstract:

As satellite earth observation enters an era of multiple satellites, high resolution, real-time response, and global observation, satellite on-orbit data processing has become one of the main methods to improve the real-time characteristic of remote sensing data processing. In scenarios where satellite resources are limited, data transmission link channels are constrained, and opportunistic observation tasks are unpredictable, real-time scheduling of data processing tasks faces significant challenges. An optimization problem model with the goal of maximizing the system's average data processing throughput rate is firstly constructed. Secondly, an online task scheduling algorithm that combines deep reinforcement learning (DRL) is proposed. DRL algorithm enables real-time calculation of task scheduling strategies, and Lagrangian dual optimization algorithm can accurately computes the optimal resource allocation. Finally, simulation experiments are conducted to evaluate the effectiveness and data processing throughput rate of the proposed algorithm. Results show that the proposed algorithm can converge and approach the optimal solution, improving data processing throughput rate by approximately 8% compared to existing algorithms, and demonstrating scalability as the satellite data arrival speed and the number of satellite computing nodes increase.The proposed algorithm can maximize the average data processing throughput rate of the system while ensuring the stability and convergence of task queue length and average energy consumption in a high-dynamic environment.

Key words: satellite on-orbit processing, task scheduling, resource allocation, deep reinforcement learning(DRL), Lyapunov optimization

中图分类号: