- 无标题文档
查看论文信息

论文中文题名:

 基于场景坐标回归的相机重定位方法研究    

姓名:

 胡少毅    

学号:

 20207040025    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 0810    

学科名称:

 工学 - 信息与通信工程    

学生类型:

 硕士    

学位级别:

 工学硕士    

学位年度:

 2023    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 信息与通信工程    

研究方向:

 计算机视觉    

第一导师姓名:

 王静    

第一导师单位:

 西安科技大学    

论文提交日期:

 2023-06-12    

论文答辩日期:

 2023-06-02    

论文外文题名:

 Research on camera relocation method based on scene coordinate regression    

论文中文关键词:

 场景坐标回归 ; 相机重定位 ; 深度过参化卷积 ; 金字塔卷积 ; 协调注意力    

论文外文关键词:

 scene coordinate regression ; camera relocation ; depthwise over-parameterized convolution ; pyramid convolution ; coordinate attention    

论文中文摘要:

相机重定位在计算机视觉领域有着十分重要的作用,近年来,无人驾驶、虚拟现实以及增强现实等技术发展迅速,相机重定位的性能影响着这些应用的效果,是这些技术的关键模块。定位通常指的是从一个已知环境中解算出机器人或传感器的位置和朝向,即位姿;而相机重定位特指从一个新的观察视角中恢复出相机在先前环境中的位姿。卷积神经网络在计算机视觉的各个任务上都取得了不错的表现,传统相机重定位方法由于需要人工提取特征并且计算量大,导致算法运行效率低且鲁棒性差,卷积神经网络则可以较好地解决这些问题,因此利用卷积神经网络进行相机重定位成为了当前研究热点。

然而,一方面,当环境中存在重复结构的低纹理物体时,相机重定位效果很容易变差。因此,本文选用场景坐标回归网络作为基础模型,首先使用深度过参化卷积代替传统卷积,提高网络获取特征的能力;其次在特征提取之后,进一步对细粒度信息进行提取,解决特征提取过程中空间信息丢失的问题。最后,将网络构成收缩膨胀的结构,输出场景坐标,建立二维平面到三维空间的关系。另一方面,由于户外环境存在过多干扰,导致相机重定位在户外场景的表现要比室内差。因此,针对在户外场景干扰因素增多的问题,采用投票分割算法对场景中的噪声进行过滤,滤除相机位姿解算过程中没有贡献的像素;然后,使用金字塔卷积构建的残差网络作为主干网络,减小原先主干网络的体积;最后添加协调注意力机制,提高主干网络编码器的抗干扰能力。

实验在7Scenes、12Scenes室内场景数据集以及Cambridge Landmark户外场景数据集上进行。实验结果表明,在室内场景下,改进的场景坐标回归网络相比于原始网络,平均距离精度提升了3.57%,平均角度精度提升了20.00%;在户外场景下,模型平均距离精度提升了29.41%,平均角度精度提升了33.33%,模型大小从236MB缩小到170MB。以上研究表明,本文改进的场景坐标回归网络能够大幅提升相机重定位精度。

论文外文摘要:

Camera relocalization plays a very important role in the field of computer vision. In recent years, technologies such as unmanned driving, virtual reality, and augmented reality have developed rapidly. The performance of camera relocalization affects the effectiveness of these applications and is an key module of these technologies. Localization usually refers to solving the position and orientation of a robot or sensor from a known environment, i.e. pose; Camera relocalization specifically refers to restoring the camera's pose in the previous environment from a new observation. Convolutional neural networks have achieved good performance in various tasks of computer vision. Traditional camera relocalization methods require manual feature extraction and high computational complexity, resulting in low algorithm efficiency and poor robustness. Convolutional neural networks can effectively solve these problems. Therefore, using convolutional neural networks for camera relocalization has become a current research hotspot.

However, on the one hand, when there are low texture objects with repetitive structures in the environment, the camera relocalization effect is easily deteriorated. Therefore, this article selects the scene coordinate regression network as the basic model, first using depthwise over-parameterized convolution instead of traditional convolution to improve the network's ability to obtain features; Secondly, after feature extraction, fine-grained information is further extracted to solve the problem of spatial information loss during the feature extraction process. Finally, the network is formed into a shrinking and expanding structure, outputting scene coordinates, and establishing a relationship from a two-dimensional plane to a three-dimensional space. On the other hand, due to excessive interference in the outdoor environment, the performance of camera relocalization in outdoor scenes is worse than indoors. Therefore, in response to the increasing interference factors in outdoor scenes, a voting segmentation algorithm is adopted to filter out the noise in the scene, filtering out pixels that did not contribute during the camera pose calculation process; Then, the residual network constructed using pyramid convolution is used as the backbone network to reduce the volume of the original backbone network; Finally, add a coordinate attention mechanism to improve the anti-interference ability of the backbone network encoder.

The experiment was conducted on the 7Scenes, 12Scenes indoor scene datasets, and the Cambridge Landmark outdoor scene datasets. The experimental results show that in indoor scenes, the improved scene coordinate regression network has an average distance accuracy improvement of 3.57% and an average angle accuracy improvement of 20.00% compared to the original network; In outdoor scenarios, the average distance accuracy of the model was improved by 29.41%, the average angle accuracy was improved by 33.33%, and the model size was reduced from 236MB to 170MB. The above research indicates that the improved scene coordinate regression network in this article can significantly improve the camera relocalization accuracy.

中图分类号:

 TP391.41    

开放日期:

 2023-06-14    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式