论文中文题名: | 基于深度学习的单目相机位姿估计方法研究 |
姓名: | |
学号: | 20207040016 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 0810 |
学科名称: | 工学 - 信息与通信工程 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 计算机视觉 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2023-06-15 |
论文答辩日期: | 2023-06-02 |
论文外文题名: | Research on monocular camera pose estimation method based on deep learning |
论文中文关键词: | |
论文外文关键词: | Camera pose estimation ; Deep learning ; Scene agnostic ; Channel attention mechanism |
论文中文摘要: |
相机位姿估计是自动驾驶、机器人技术等任务的重要组成部分,也是实现计算机视觉核心任务的第一步,为真正自主和自动化系统奠定了基础。传统的解决位姿估计问题的方法主要依靠几何运动的约束,通过特征提取、匹配等一系列步骤来求解相机位姿。然而,传统方法计算量较大、易受光线等因素影响,算法鲁棒性较差。随着深度学习的飞速发展,使用深度学习进行相机位姿估计成为了当前的研究热点之一。 针对现有的基于深度学习的相机位姿估计模型泛化性较差、位姿估计精度较低等问题,提出了一种基于多尺度图像特征的相机位姿估计算法。算法使用UNet作为特征提取网络,可以从图像中提取多尺度的特征信息。为了获得更准确的图像特征,首先将主干网络的编码器部分改为ResNet50,增强网络的特征提取能力,使网络能够更准确、高效地提取图像中的特征。然后,在主干网络的解码器部分添加通道注意力机制,使网络提取特征时更关注于图像中的重要信息,从而防止微小的细节信息被冗余信息淹没。最后,通过分离模型参数与位姿优化器来提高模型的泛化性,减少了模型对场景信息的依赖。为了进一步解决挑战性场景中位姿估计精度较低的问题,提出一种融合图像边缘信息的相机位姿估计算法。首先,算法将相机位姿估计误差损失与重投影误差损失进行融合,作为改进网络的损失函数,提升损失函数在挑战性场景中的约束力。其次,通过融合图像边缘信息,增强图像中梯度较大的点在相机位姿估计时的作用,帮助神经网络寻找正确的梯度下降方向,提高了挑战性场景中相机位姿估计的精度。 本文分别在公共的室内7-Scenes数据集和室外Cambridge Landmarks数据集上进行实验评估。实验结果表明,本文提出的相机位姿估计算法,可以实现较高精度的相机位姿估计。与现有算法相比,本文算法在室内数据集中的平均旋转误差和平移误差分别减小了6.5%、4.0%;在室外数据集中的平均旋转误差和平移误差分别减小了22.7%、11.1%。本文提出的算法在不同的数据集均展现出了优越性,可以实现较高精度的位姿估计。 |
论文外文摘要: |
Camera pose estimation is an important part of tasks such as autonomous driving and robotics, and is the first step toward realizing the core tasks of computer vision, laying the foundation for truly autonomous and automated systems. Traditional methods to solve the pose estimation problem mainly rely on the constraints of geometric motion and solve the camera pose through a series of steps such as feature extraction and matching. However, traditional methods are computationally intensive, susceptible to light and other factors, and have poor algorithm robustness. With the rapid development of deep learning, the use of deep learning for camera pose estimation has become one of the current research hotspots. To address the problems of poor generalization and low accuracy of existing deep learning based camera pose estimation models, a camera pose estimation algorithm based on multi-scale image features is proposed. The algorithm uses UNet as a feature extraction network, which can extract multi-scale feature information from images. In order to obtain more accurate image features, the encoder part of the backbone network is firstly changed to ResNet50 to enhance the feature extraction capability of the network, so that the network can extract features in images more accurately and efficiently. Then, a channel attention mechanism is added to the decoder part of the backbone network to make the network focus more on the important information in the image when extracting features, thus preventing the minute detail information from being overwhelmed by redundant information. Finally, the generalizability of the model is improved by separating the model parameters from the positional optimizer, which reduces the reliance of the model on scene information. To further solve the problem of low accuracy of pose estimation in challenging scenes, a camera pose estimation algorithm incorporating image edge information is proposed. First, the algorithm fuses the camera pose estimation error loss with the reprojection error loss as the loss function of the improved network to enhance the binding force of the loss function in challenging scenes. Secondly, by fusing image edge information, it enhances the role of the points with larger gradient in the image in camera pose estimation and helps the neural network to find the correct gradient descent direction, which improves the accuracy of camera pose estimation in challenging scenes. In this paper, experiments are evaluated on the public indoor 7-Scenes dataset and the outdoor Cambridge Landmarks dataset, respectively. The experimental results show that the camera pose estimation algorithm proposed in this paper can achieve higher accuracy of camera pose estimation. Compared with the existing algorithms, the average rotation and translation errors of the algorithm in this paper are reduced by 6.5% and 4.0%, respectively, in the indoor dataset, and the average rotation and translation errors are reduced by 22.7% and 11.1%, respectively, in the outdoor dataset. The proposed algorithm shows superiority in different datasets and can achieve higher accuracy in positional estimation. |
中图分类号: | TP391.41 |
开放日期: | 2023-06-15 |