论文中文题名: | 基于特征融合的目标位姿估计算法研究 |
姓名: | |
学号: | 19208049006 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 0812 |
学科名称: | 工学 - 计算机科学与技术(可授工学、理学学位) |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2022 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 媒体计算与可视化 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2022-06-21 |
论文答辩日期: | 2022-06-06 |
论文外文题名: | Research on Object Pose Estimation Method Based on Feature Fusion |
论文中文关键词: | |
论文外文关键词: | Pose estimation ; Feature fusion ; Point cloud ; Texture-less ; Occlusion object |
论文中文摘要: |
随着智能制造业的发展,工业领域中大量的任务需要智能机器人协助完成,而位姿估计在机器人与环境的交互中起着关键作用。因此,研究位姿估计相关方法具有一定的实际意义。近年来,位姿估计技术得到了快速的发展,然而针对位姿估计中存在弱纹理目标和目标间遮挡的问题,仍然未能得到很好地解决。目前,许多研究方法通过使用RGB图像获取关键信息来实现位姿估计,较少考虑光照因素对RGB图像质量的影响,很难提升位姿估计的精度。RGB-D图像是在RGB信息的基础上,提供额外的深度信息进行补充,有利于实现高精度的位姿估计。本文针对弱纹理目标及遮挡目标进行研究,主要研究内容和创新点如下: (1)针对现有密集融合方法中,存在忽略点云局部之间的几何特征,从而导致几何特征提取不足的问题,提出了一种基于特征融合的目标位姿估计方法。该方法首先从RGB-D图像中提取目标的颜色特征与点云特征;其次,通过点集抽象在区域内对点云提取精细的局部几何特征,并扩展到更大局部区域,从而获得不同层次的局部几何特征以及目标全局几何特征;最后,将目标的颜色特征与几何特征进行融合,通过训练神经网络输出初始位姿。在LineMOD数据集和YCB-Video数据集上进行实验,结果表明提出方法与对比算法相比,LineMOD数据集平均位姿估计精度提高0.5%-31.5%,YCB-Video数据集平均位姿估计精度提高0.5%-5.6%。 (2)针对编解码特征融合方法中,存在特征提取时两个分支独立进行,从而限制提取特征表征能力的问题,提出了一种改进编解码特征融合的位姿估计方法。首先,在编码层与解码层,对于RGB图像分支,使用卷积层提取目标颜色特征,对于点云分支,通过构建K邻近有向图,提取邻域内点云的注意力特征作为局部几何特征;其次,在每一个编码层和解码层之间,将两种特征进行融合;最后,完成位姿估计。在LineMOD数据集和Occlusion LineMOD数据集上进行实验,结果表明提出方法与对比算法相比,LineMOD数据集平均位姿估计精度提高2.2%-14.1%,Occlusion LineMOD数据集平均位姿估计精度提高5.6%-21.5%。 提出方法对于弱纹理目标和遮挡目标均有优秀的位姿估计精度表现。在现实中,对机器人智能化具有一定的理论意义和应用价值。 |
论文外文摘要: |
With the development of intelligent manufacturing industry, many tasks in the industrial field needs the assistance of intelligent robots. And pose estimation plays a key role in the interaction between robots and environment. The study of pose estimation related methods has a certain practical significance. In recent years, pose estimation technology develops rapidly. However, the issue of texture-less object and occlusion between objects in pose estimation has not been well addressed. At present, many research methods use RGB images to obtain key information to realize pose estimation, and do not consider the impact of illumination factors on RGB image quality. RGB-D image provides additional depth information based on RGB information, which is conducive to high-precision pose estimation. In this paper, weak texture objects and occlusion objects are studied. The main research contents and innovations are as follows: (1) Aiming at the issue that the existing dense fusion methods ignore the geometric features between the local points of the point cloud, results in insufficient geometric feature extraction. An object pose estimation method based on feature fusion is proposed. First, the color features and point cloud features of the object are extracted from the RGB-D image. Second, fine local geometric features are extracted from the point cloud in the region through point set abstraction, and extended to a larger local region to obtain different levels of local geometric features and global geometric features of the object. Finally, the color features and geometric features of the object are fused. The initial pose is output by training the neural network. Experiments on LineMOD dataset and YCB-Video dataset show that compared with the comparison method, the average pose estimation accuracy of LineMOD dataset is improved by 0.5% - 31.5%, and that of YCB-Video dataset is improved by 0.5% - 5.6%. (2) Aiming at the issue that the two branches of feature extraction are carried out independently in the encoder-decoder feature fusion method, which limits the ability of feature extraction. Pose estimation method based on improved encoder-decoder feature fusion is proposed. First, in the encoding layer and decoding layer, the convolution layer is used to extract the object color feature. For the point cloud branch, the attention feature of the point cloud in the neighborhood is extracted as the local geometric feature by constructing the K-adjacent directed graph. Second, the two features are fused between each coding layer and decoding layer. Finally, the pose estimation is completed. Experiments on LineMOD dataset and Occlusion LineMOD dataset show that compared with the comparison method, the average pose estimation accuracy of LineMOD dataset is improved by 2.2% - 14.1%, and the average pose estimation accuracy of Occlusion LineMOD dataset is improved by 5.6% - 21.5%. The proposed method has excellent pose estimation accuracy for texture-less and occluded objects. It has certain theoretical significance and application value for robot intelligence. |
中图分类号: | TP391 |
开放日期: | 2022-06-21 |