系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (6): 1867-1877.doi: 10.12305/j.issn.1001-506X.2024.06.05

• 电子技术 • 上一篇    

基于感兴趣区域的物体抓取位姿检测

孙先涛1, 江汪洋1, 陈文杰1,*, 陈伟海2, 智亚丽1   

  1. 1. 安徽大学电气工程与自动化学院, 安徽 合肥 230601
    2. 北京航空航天大学自动化科学与电气工程学院, 北京 100191
  • 收稿日期:2023-03-02 出版日期:2024-05-25 发布日期:2024-06-04
  • 通讯作者: 陈文杰
  • 作者简介:孙先涛(1985—), 男, 副教授, 博士, 主要研究方向为机器视觉、欠驱动机械手
    江汪洋(1998—), 男, 硕士研究生, 主要研究方向为机器视觉、机器人抓取
    陈文杰(1964—), 男, 教授, 博士, 主要研究方向为机器视觉、助力外骨骼
    陈伟海(1955—), 男, 教授, 博士, 主要研究方向为机器人抓取、高精密运动机械设计与控制
    智亚丽(1987—), 女, 讲师, 博士, 主要研究方向为间歇控制、人工智能
  • 基金资助:
    国家自然科学基金(52005001)

Object grasp pose detection based on the region of interest

Xiantao SUN1, Wangyang JIANG1, Wenjie CHEN1,*, Weihai CHEN2, Yali ZHI1   

  1. 1. School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
    2. School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
  • Received:2023-03-02 Online:2024-05-25 Published:2024-06-04
  • Contact: Wenjie CHEN

摘要:

在工业生产中, 待抓取物体往往具有种类众多、摆放位置杂乱、形状不规则等特点, 使得难以准确获取物体抓取位姿。针对以上问题, 提出一种基于深度学习的两阶段抓取位姿估计方法。第1阶段, 提出一种基于YOLOv4(you only look once version4)改进的轻量级旋转目标检测算法, 提高目标的检测速度和检测精度。首先, 使用轻量化网络GhostNet和深度可分离卷积对原始网络进行重构, 降低整个模型参数。然后, 在颈部网络中增加自适应空间特征融合结构和无参注意力模块, 提高对感兴趣区域的定位精度; 最后, 使用近似倾斜交并比(skew intersection over union, SkewIoU)损失解决角度的周期性问题。第2阶段, 制作与原始图片尺寸一样的掩膜提取感兴趣区域; 同时, 提出一种改进的DeepLabV3+算法, 用以检测感兴趣区域中物体的抓取位姿。实验结果表明, 改进后的YOLOv4网络检测精度达到92.5%, 改进的DeepLabV3+算法在Cornell抓取数据集上的图像拆分和对象拆分精度分别达到94.6%, 92.4%, 且能准确检测出物体的抓取位姿。

关键词: 深度学习, 掩膜, 感兴趣区域, 轻量化网络, 位姿检测

Abstract:

In industrial production, the objects to be grasped often have the characteristics of varions types, messy placements, irregular shapes, etc., which make it difficult to accurately obtain the grasping pose of the object. In view of the above problems, this paper proposes a two-stage grasp pose estimation method based on deep learning. In the first stage, a lightweight rotating target detection algorithm based on improved you only look once version4 (YOLOv4) is proposed to enhance the detection speed and improve detection accuracy of targets. Firstly, the lightweight network GhostNet and deep separable convolution are used to reconstruct the original network to reduce the parameters of the entire model. Then, the adaptive spatial feature fusion structure and the non-reference attention module are added to the neck network to improve the positioning accuracy of the region of interest. Finally, the approximate skew intersection over union (SkewIoU) loss is used to solve the periodic problem of the angle. In the second stage, a mask extraction region of interest is made with the same size as the original picture. At the same time, an improved DeepLabV3+algorithm is proposed to detect the grasping pose of objects in the area of interest. Experimental results show that the detection accuracy of the improved YOLOv4 network reaches 92.5%, and the improved DeepLabV3+algorithm achieves 94.6% and 92.4% of the image splitting and object splitting accuracy on the Cornell capture dataset, respectively, and can accurately detect the grasping pose of objects.

Key words: deep learning, mask, region of interest, lightweight network, pose detection

中图分类号: