系统工程与电子技术 ›› 2026, Vol. 48 ›› Issue (5): 1502-1514.doi: 10.12305/j.issn.1001-506X.2026.05.06

• 传感器与信号处理 • 上一篇    下一篇

基于CNN-BiLSTM-MHA时空融合框架的毫米波雷达人体姿态估计

罗雨泉1(), 何雨强1(), 李雅鑫2,*(), 梁松2, 王俊1   

  1. 1. 北京航空航天大学电子信息工程学院,北京 100191
    2. 北京航空航天大学杭州创新研究院,浙江 杭州 310051
  • 收稿日期:2025-02-24 接受日期:2025-06-23 出版日期:2026-05-27 发布日期:2026-05-27
  • 通讯作者: 李雅鑫 E-mail:luoyuquanhz@163.com;buaahyq@buaa.edu.cn;lyx_hnu@126.com
  • 作者简介:罗雨泉(1994—),男,博士研究生,主要研究方向为雷达信号处理、深度学习、人体感知
    何雨强(1995—),男,博士研究生,主要研究方向为雷达信号处理、目标跟踪
    梁 松(1997—),男,助理工程师,硕士,主要研究方向为雷达信号处理、人体感知
    王 俊(1972—),男,教授,博士,主要研究方向为信号处理、目标识别与跟踪
  • 基金资助:
    浙江省科技计划-“尖兵”“领雁”研发攻关计划(2023C01148);杭州市领军型创新创业团队(TD2022006)资助课题

Millimeter-wave radar human pose estimation based on CNN-BiLSTM-MHA spatio-temporal fusion framework

Yuquan LUO1(), Yuqiang HE1(), Yaxin LI2,*(), Song LIANG2, Jun WANG1   

  1. 1. School of Electronic Information Engineering,Beihang University,Beijing 100191,China
    2. Hangzhou Innovation Institute of Beihang University,Hangzhou 310051,China
  • Received:2025-02-24 Accepted:2025-06-23 Online:2026-05-27 Published:2026-05-27
  • Contact: Yaxin LI E-mail:luoyuquanhz@163.com;buaahyq@buaa.edu.cn;lyx_hnu@126.com

摘要:

人体姿态估计在人机交互、活动识别与健康监测等领域具有广泛的应用前景。传统基于光学传感器的方法易受光照条件限制且存在隐私泄露风险,而基于可穿戴设备的技术则存在使用繁琐、长期佩戴不适等问题。为此,提出一种基于卷积神经网络(convolutional neural network, CNN)、双向长短期记忆(bidirectional long-short term memory, BiLSTM)网络和多头注意力(multi-head attention, MHA)机制时空融合框架的毫米波雷达人体姿态估计方法。通过自主研发的毫米波雷达设备生成高质量点云数据,引入滑动窗口机制将单帧点云扩展为多帧时间序列数据。结合CNN提取空间特征,采用BiLSTM进行时序建模,引入MHA机制进一步优化全局特征表达能力。基于多帧点云数据的时空信息融合框架能够充分挖掘时空特征,有效缓解雷达点云稀疏性问题,显著提升了姿态估计的精度与鲁棒性。实验结果表明,所提方法能够实现25个骨骼关节点的定位,xyz轴平均误差分别为2.69 cm、2.49 cm与2.98 cm,为毫米波雷达在人体姿态估计中的应用提供了解决方案,具有广泛的实际应用潜力。

关键词: 毫米波雷达, 人体姿态估计, 卷积神经网络, 双向长短期记忆网络, 多头注意力机制

Abstract:

Human pose estimation has many applications in human-computer interaction, activity recognition, and health monitoring. Traditional methods based on optical sensors are often limited by lighting conditions and privacy leakage risks, while wearable device-based technologies face issues such as cumbersome usage and discomfort during long-term wear. To address these challenges, a millimeter-wave radar human pose estimation method is proposed that leverages a spatio-temporal fusion framework combining convolutional neural network (CNN), bidirectional long short-term memory (BiLSTM) network and multi-head attention (MHA). High-quality point cloud data are generated using self-developed millimeter-wave radar equipment, and a sliding window mechanism is introduced to expand single-frame point clouds into multi-frame time series datas. Spatial features are extracted through CNN, and time-series modeling is performed using BiLSTM, and further optimization of global feature expression through MHA. This spatio-temporal information fusion framework, based on multi-frame point cloud datas, effectively exploits spatio-temporal features, mitigates the radar point cloud sparsity issue, and significantly enhances the accuracy and robustness of pose estimation. Experimental results show that, compared to existing methods, the proposed method successfully localizes all 25 skeletal joints, with average localization errors of 2.69 cm, 2.49 cm, and 2.98 cm along the x, y, and z axes, respectively. This provides a solution for millimeter-wave radar human pose estimation and demonstrates strong practical application potential.

Key words: millimeter-wave radar, human pose estimation, convolutional neural network(CNN), bidirectional long short-term memory(BiLSTM) network, multi-head attention(MHA)

中图分类号: