面向卫星在轨处理的强化学习任务调度算法

doi:10.12305/j.issn.1001-506X.2025.06.20

系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (6): 1917-1929.doi: 10.12305/j.issn.1001-506X.2025.06.20

面向卫星在轨处理的强化学习任务调度算法

孟麟芝¹^,²^,³, 孙小涓¹^,²^,³^,*, 胡玉新¹^,²^,³, 高斌¹^,², 孙国庆¹^,², 牟文浩¹^,²

1. 中国科学院空天信息创新研究院, 北京 100190
2. 中国科学院空间信息处理与应用系统技术重点实验室, 北京 100190
3. 中国科学院大学电子电气与通信工程学院, 北京 100049

收稿日期:2024-06-26 出版日期:2025-06-25 发布日期:2025-07-09
通讯作者: 孙小涓
作者简介:孟麟芝(2000—), 男, 硕士研究生, 主要研究方向为大数据与云计算
孙小涓(1980—), 女, 研究员, 博士, 主要研究方向为空间信息处理、高性能计算
胡玉新(1981—), 男, 研究员, 博士, 主要研究方向为空间信息处理系统
高斌(1990—), 男, 助理研究员, 硕士, 主要研究方向为卫星地面应用系统
孙国庆(1992—), 男, 助理研究员, 硕士, 主要研究方向为卫星地面应用系统
牟文浩(1995—), 男, 工程师, 硕士, 主要研究方向为卫星地面应用系统

Reinforcement learning task scheduling algorithm for satellite on-orbit processing

Linzhi MENG¹^,²^,³, Xiaojuan SUN¹^,²^,³^,*, Yuxin HU¹^,²^,³, Bin GAO¹^,², Guoqing SUN¹^,², Wenhao MU¹^,²

1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China
2. Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Beijing 100190, China
3. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China

Received:2024-06-26 Online:2025-06-25 Published:2025-07-09
Contact: Xiaojuan SUN

摘要/Abstract

摘要：

随着卫星对地观测进入多卫星、高分辨率、实时响应、全球观测的时代, 卫星在轨数据处理已成为提高遥感数据处理实时性的主流手段之一。在卫星资源受限、数传链路信道资源受限、随遇观测任务具有不可预测性的场景下, 进行数据处理任务实时调度具有较大挑战。首先，构建以最大化系统平均数据处理吞吐率为目标的优化问题模型。然后，提出一种在线的结合深度强化学习(deep reinforcement learning, DRL)的任务调度算法, 采用DRL算法能够实时计算任务调度策略, 选取拉格朗日对偶优化算法能够准确计算最优资源分配量。最后，通过仿真实验对算法有效性和数据处理吞吐率进行评价, 结果表明算法能够收敛并接近最优解, 相比于已有算法将数据处理吞吐率提高了约8%, 且在卫星数据到达速率及卫星计算节点数量增大时具有一定扩展性。所提算法能够在最大化系统平均数据处理吞吐率的同时, 保障高动态环境下任务队列长度及平均能耗稳定收敛。

关键词: 卫星在轨处理, 任务调度, 资源分配, 深度强化学习, 李雅普诺夫优化

Abstract:

As satellite earth observation enters an era of multiple satellites, high resolution, real-time response, and global observation, satellite on-orbit data processing has become one of the main methods to improve the real-time characteristic of remote sensing data processing. In scenarios where satellite resources are limited, data transmission link channels are constrained, and opportunistic observation tasks are unpredictable, real-time scheduling of data processing tasks faces significant challenges. An optimization problem model with the goal of maximizing the system's average data processing throughput rate is firstly constructed. Secondly, an online task scheduling algorithm that combines deep reinforcement learning (DRL) is proposed. DRL algorithm enables real-time calculation of task scheduling strategies, and Lagrangian dual optimization algorithm can accurately computes the optimal resource allocation. Finally, simulation experiments are conducted to evaluate the effectiveness and data processing throughput rate of the proposed algorithm. Results show that the proposed algorithm can converge and approach the optimal solution, improving data processing throughput rate by approximately 8% compared to existing algorithms, and demonstrating scalability as the satellite data arrival speed and the number of satellite computing nodes increase.The proposed algorithm can maximize the average data processing throughput rate of the system while ensuring the stability and convergence of task queue length and average energy consumption in a high-dynamic environment.

Key words: satellite on-orbit processing, task scheduling, resource allocation, deep reinforcement learning(DRL), Lyapunov optimization

中图分类号:

TP391

孟麟芝, 孙小涓, 胡玉新, 高斌, 孙国庆, 牟文浩. 面向卫星在轨处理的强化学习任务调度算法[J]. 系统工程与电子技术, 2025, 47(6): 1917-1929.

Linzhi MENG, Xiaojuan SUN, Yuxin HU, Bin GAO, Guoqing SUN, Wenhao MU. Reinforcement learning task scheduling algorithm for satellite on-orbit processing[J]. Systems Engineering and Electronics, 2025, 47(6): 1917-1929.

图/表 10

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

参考文献 31

1	LAI Z Q, WU Q, LI H W, et al. Orbitcast: exploiting mega-constellations for low-latency earth observation[C]//Proc. of the IEEE 29th International Conference on Network Protocols, 2021.
2	ZHANG P , QIN Q , ZHANG S J , et al. Near real-time remote sensing based on satellite internet: architectures, key tech niques, and experimental progress[J]. Aerospace, 2024, 11 (2): 167. doi: 10.3390/aerospace11020167
3	王龙河, 周一青, 曹欢, 等. 卫星互联网资源管控技术研究[J]. 中国工程科学, 2023, 25 (6): 27- 38.
	WANG L H , ZHOU Y Q , CAO H , et al. Resource management and scheduling for satellite internet[J]. Strategic Study of Strategic Study of Computer Aided Engineering, 2023, 25 (6): 27- 38.
4	CAON M, ROS P M, MARTINA M, et al. Very low latency architecture for earth observation satellite onboard data handling, compression, and encryption[C]//Proc. of the IEEE International Geoscience and Remote Sensing Symposium, 2021: 7791-7794.
5	VINICIUS A D O , MARIE C , OBERLIN T , et al. Reduced-complexity end-to-end variational autoencoder for on board satellite image compression[J]. Remote Sensing, 2021, 13 (3): 447. doi: 10.3390/rs13030447
6	LI Y J , WANG M , HWANG K , et al. LEO satellite constellation for global-scale remote sensing with on-orbit cloud AI computing[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 16, 9369- 9381. doi: 10.1109/JSTARS.2023.3316298
7	杨宁, 朱其星, 伍攀峰, 等. 星上遥感影像在轨处理进展研究[J]. 空间电子技术, 2023, 20 (4): 1- 8.
	YANG N , ZHU Q X , WU P F , et al. Research on the progress of on-orbit processing of satellite remote sensing images[J]. Space Electronic Technology, 2023, 20 (4): 1- 8.
8	WANG S G , ZHANG Q Y , XING R L , et al. The first verification test of space-ground collaborative intelligence via cloud- native satellites[J]. China Communications, 2024, 21 (4): 208- 217. doi: 10.23919/JCC.fa.2022-0422.202404
9	SHEN Z S , JIN J , TAN C , et al. A survey of next-generation computing technologies in space-air-ground integrated networks[J]. ACM Computing Surveys, 2024, 56 (1): 1- 40.
10	WANG S G , LI Q . Satellite computing: vision and challenges[J]. IEEE Internet of Things Journal, 2023, 10 (24): 22514- 22529. doi: 10.1109/JIOT.2023.3303346
11	YAN J , BI S , ZHANG Y J , et al. Optimal task offloading and resource allocation in mobile-edge computing with inter-user task dependency[J]. IEEE Trans.on Wireless Communications, 2020, 19 (1): 235- 250. doi: 10.1109/TWC.2019.2943563
12	BI S , ZHANG Y J . Computation rate maximization for wireless powered mobile-edge computing with binary computation offloading[J]. IEEE Trans.on Wireless Communications, 2018, 17 (6): 4177- 4190. doi: 10.1109/TWC.2018.2821664
13	TRAN T X , POMPILI D . Joint task offloading and resource allocation for multi-server mobile-edge computing networks[J]. IEEE Trans.on Vehicular Technology, 2019, 68 (1): 856- 868. doi: 10.1109/TVT.2018.2881191
14	GUO S T, XIAO B, YANG Y Y, et al. Energy-efficient dynamic offloading and resource scheduling in mobile cloud computing[C]//Proc. of the 35th Annual IEEE International Conference on Computer Communications, 2016.
15	TANG Q Q , FEI Z S , LI B , et al. Computation offloading in LEO satellite networks with hybrid cloud and edge computing[J]. IEEE Internet of Things Journal, 2021, 8 (11): 9164- 9176. doi: 10.1109/JIOT.2021.3056569
16	CHEN J , XING H L , XIAO Z W , et al. A DRL agent for jointly optimizing computation offloading and resource allocation in MEC[J]. IEEE Internet of Things Journal, 2021, 8 (24): 17508- 17524. doi: 10.1109/JIOT.2021.3081694
17	ALE L , ZHANG N , FANG X J , et al. Delay-aware and energy-efficient computation offloading in mobile-edge computing using deep reinforcement learning[J]. IEEE Trans.on Cognitive Communications and Networking, 2021, 7 (3): 881- 892. doi: 10.1109/TCCN.2021.3066619
18	ZHANG S B , BAO S L , CHI K K , et al. DRL-based computation rate maximization for wireless powered multi-AP edge computing[J]. IEEE Trans.on Communications, 2024, 72 (2): 1105- 1118. doi: 10.1109/TCOMM.2023.3325905
19	XU J, YANG D J. Optimal task offloading for edge computing with stochastic task arrivals[C]//Proc. of the IEEE International Performance, Computing, and Communications Conference, 2023: 24-31.
20	许斌, 赵云凯, 朱剑鸣, 等. 移动边缘计算不确定性任务持续卸载及资源分配方法[J]. 软件学报, 2024, 35 (3): 1466- 1484.
	XU B , ZHAO Y K , ZHU J M , et al. Continuous offloading and resource allocation method of uncertain tasks in mobile edge computing[J]. Journal of Software, 2024, 35 (3): 1466- 1484.
21	ZAKI A M, ELSAYED S A, ELGAZZAR K, et al. Heuristic-based proactive service migration induced by dynamic computation load in edge computing[C]//Proc. of the IEEE Global Communications Conference, 2022: 5668-5673.
22	NEELY M J . Stochastic network optimization with application to communication and queueing systems[M]. San Rafael: Morgan & Claypool Publishers, 2010.
23	BI S Z , HUANG L , WANG H , et al. Lyapunov-guided deep reinforcement learning for stable online computation offloading in mobile-edge computing networks[J]. IEEE Trans.on Wireless Communications, 2021, 20 (11): 7519- 7537. doi: 10.1109/TWC.2021.3085319
24	TANG Q Q , FEI Z S , LI B , et al. Stochastic computation offloading for LEO satellite edge computing networks: a learning-based approach[J]. IEEE Internet of Things Journal, 2024, 11 (4): 5638- 5652. doi: 10.1109/JIOT.2023.3307707
25	付主木, 王俊朋, 司鹏举, 等. 基于李雅普诺夫随机优化的车辆边缘计算资源管理[J]. 控制与决策, 2022, 37 (3): 721- 728.
	FU Z M , WANG J P , SI P J , et al. Resource management of vehicle edge computing based on Lyapunov stochastic optimization[J]. Control and Decision, 2022, 37 (3): 721- 728.
26	许驰, 唐紫萱, 金曦, 等. 基于李雅普诺夫优化和深度强化学习的多任务端边迁移[J]. 控制与决策, 2024, 39 (7): 2457- 2464.
	XU C , TANG Z X , JIN X , et al. Multi-task end-edge offloading based on Lyapunov optimization and deep reinforcement learning[J]. Control and Decision, 2024, 39 (7): 2457- 2464.
27	CHANG S , DENG S , WU Y H , et al. Online energy balancing strategy based on Lyapunov optimization in mobile crowdsensing[J]. IEEE Trans.on Industrial Informatics, 2023, 19 (9): 9266- 9279.
28	LIAO H J , ZHOU Z Y , ZHAO X W , et al. Learning-based queue-aware task offloading and resource allocation for space-air-ground-integrated power IoT[J]. IEEE Internet of Things Journal, 2021, 8 (7): 5250- 5263.
29	DAI Y Y , ZHANG K , MAHARJAN S , et al. Deep reinforcement learning for stochastic computation offloading in digital twin networks[J]. IEEE Trans.on Industrial Informatics, 2021, 17 (7): 4968- 4977.
30	MAO Y Y , ZHANG J , SONG S H , et al. Stochastic joint radio and computational resource management for multi-user mobile-edge computing systems[J]. IEEE Trans.on Wireless Communications, 2017, 16 (9): 5994- 6009.
31	WANG D Z , WANG W , KANG Y H , et al. Distributed data offloading in ultra-dense LEO satellite networks: a Stackelberg mean-field game approach[J]. IEEE Journal of Selected Topics in Signal Processing, 2023, 17 (1): 112- 127.

[1]	郑康洁, 张新宇, 王伟菘, 刘震生. DQN与规则结合的智能船舶动态自主避障决策[J]. 系统工程与电子技术, 2025, 47(6): 1994-2001.
[2]	刘书含, 李彤, 李富强, 杨春刚. 意图态势双驱动的数据链抗干扰通信机制[J]. 系统工程与电子技术, 2025, 47(6): 2055-2064.
[3]	阎潇, 王青平, 胡卫东, 朱虹宇, 王超. 基于椋鸟迁徙的干扰资源动态分配方法[J]. 系统工程与电子技术, 2025, 47(5): 1385-1394.
[4]	熊威, 张栋, 任智, 杨书恒. 面向有人/无人机协同打击的智能决策方法研究[J]. 系统工程与电子技术, 2025, 47(4): 1285-1299.
[5]	马鹏, 蒋睿, 王斌, 徐盟飞, 侯长波. 基于隐式对手建模的策略重构抗智能干扰方法[J]. 系统工程与电子技术, 2025, 47(4): 1355-1363.
[6]	唐开强, 傅汇乔, 刘佳生, 邓归洲, 陈春林. 基于深度强化学习的带约束车辆路径分层优化研究[J]. 系统工程与电子技术, 2025, 47(3): 827-841.
[7]	陈夏瑢, 李际超, 陈刚, 刘鹏, 姜江. 基于异质网络的装备体系组合发展规划问题[J]. 系统工程与电子技术, 2025, 47(3): 855-861.
[8]	蒋李兵, 杨庆伟, 郑舒予, 王壮. 基于拍卖理论的组网雷达多轨道目标ISAR成像资源分配算法[J]. 系统工程与电子技术, 2025, 47(1): 81-93.
[9]	张庭瑜, 曾颖, 李楠, 黄洪钟. 基于深度强化学习的航天器功率-信号复合网络优化算法[J]. 系统工程与电子技术, 2024, 46(9): 3060-3069.
[10]	夏雨奇, 黄炎焱, 陈恰. 基于深度Q网络的无人车侦察路径规划[J]. 系统工程与电子技术, 2024, 46(9): 3070-3081.
[11]	杨志鹏, 陈子浩, 曾长, 林松, 毛金娣, 张凯. 复杂环境下的飞行器在线航路规划决策方法[J]. 系统工程与电子技术, 2024, 46(9): 3166-3175.
[12]	许强强, 柴华. 基于NSGA-Ⅱ的车载光学测量设备任务调度方案优化[J]. 系统工程与电子技术, 2024, 46(7): 2393-2400.
[13]	郭宏达, 娄静涛, 徐友春, 叶鹏, 李永乐, 陈晋生. 基于MADDPG的多无人车协同事件触发通信[J]. 系统工程与电子技术, 2024, 46(7): 2525-2533.
[14]	刘祥林, 杨春刚, 李富强, 欧阳颖, 宋延博. 意图驱动数据链网络策略协商模型与算法[J]. 系统工程与电子技术, 2024, 46(6): 2128-2137.
[15]	张梦钰, 豆亚杰, 陈子夷, 姜江, 杨克巍, 葛冰峰. 深度强化学习及其在军事领域中的应用综述[J]. 系统工程与电子技术, 2024, 46(4): 1297-1308.

面向卫星在轨处理的强化学习任务调度算法

Reinforcement learning task scheduling algorithm for satellite on-orbit processing

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 31

相关文章 15

编辑推荐

Metrics

本文评价