论文中文题名: | 基于深度学习的多视图三维重建算法研究 |
姓名: | |
学号: | 22207223096 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085400 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2025 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 计算机视觉 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2025-06-16 |
论文答辩日期: | 2025-06-06 |
论文外文题名: | Research on Multi-View 3D Reconstruction Algorithm Based on Deep Learning |
论文中文关键词: | |
论文外文关键词: | Multi-view 3D reconstruction ; Deep learning ; Depth separable convolution ; Attention mechanism ; Recurrent neural network |
论文中文摘要: |
伴随计算机视觉以空前速度迭代演进,多视图三维重建技术作为该领域的研究热点之一,在虚拟现实、自动驾驶、文化遗产保护等领域有着广泛应用。多视图三维重建旨在利用多视角图像重构出现实场景的三维结构与几何信息。相较于传统方法,基于深度学习的方法在深度估计上取得显著进展。然而,现有方法在弱纹理、非朗伯表面和遮挡等复杂场景下,仍面临重建结果不完整、对不同场景的泛化能力较差,以及模型参数量大幅增加等挑战。 针对上述挑战,本文将多视图三维重建技术作为研究对象,以提高重建完整度和整体效率为目标,着重从多视图特征提取优化和三维卷积神经网络(3D CNN)结构轻量化两方面展开深入分析与研究,具体内容如下: (1)以CasMVSNet网络为基准,本文提出一种结合并行卷积-注意力块与特征聚合模块的多尺度特征提取网络。在特征金字塔网络的顶层跳跃连接中,设计了基于深度可分离卷积和自注意力机制的并行卷积-注意力块,利用不同类型的卷积操作捕捉局部细节与全局上下文信息,同时基于自注意力机制动态调整特征权重,实现多尺度特征的高效提取与融合。此外,在特征金字塔网络末端增设基于通道注意力机制的特征聚合模块,突出关键特征,从而更准确地捕捉和利用有效特征信息,为正则化阶段提供更高质量的输入。实验结果表明,在DTU数据集上,重建点云的完整度和整体度较基准网络分别提升了26.23%和6.48%,改善了复杂场景下重建不完整和可视化效果差的问题。在Tanks and Temples数据集上,平均F-score值达到62.83,较基准网络提升了10.54%,表现出较强的泛化性能。 (2)针对3D CNN处理高分辨率数据时内存占用高、计算成本大的问题,提出一种混合循环正则化网络。该方法融合了2D U-Net架构与循环神经网络的优势,依据重建任务的阶段性特征,采用差异化的模块组进行正则化。在初始阶段,鉴于图像分辨率较低且待估计深度面数量较多,利用Hybrid Unet-ConvLSTMCell模块沿深度方向进行正则化;在后续优化阶段,考虑到图像分辨率较高且待估计深度面数量较少,通过Hybrid Unet-ConvGRU模块进行正则化。实验结果表明,在DTU数据集上,重建点云的完整度较基准网络提升了26.49%,该策略充分利用空间上下文信息,保证重建效果的同时,显著减少了网络参数量和内存消耗,有效缓解了传统3D CNN显存消耗问题。 本文提出的基于多尺度特征提取与基于混合循环正则化的多视图三维重建方法能够提取更丰富的深度信息,提升特征表征质量,有效抑制边缘和背景噪声,同时降低计算成本并增强对高分辨率数据的适应能力,从而优化重建效果。 |
论文外文摘要: |
Accompanied by the iterative evolution of computer vision at an unprecedented speed, multi-view 3D reconstruction technology, as one of the research hotspots in this field, has a wide range of applications in the fields of virtual reality, automated driving, and cultural heritage protection. Multi-view 3D reconstruction aims to reconstruct the 3D structure and geometric information of a real scene using multi-view images. Compared with traditional methods, deep learning-based methods have made significant progress in depth estimation. However, existing methods still face challenges such as incomplete reconstruction results, poor generalization ability to different scenes, and a significant increase in the number of model parameters in complex scenes with weak textures, non-Lambertian surfaces, and occlusions. Aiming at the above challenges, this thesis takes the multi-view 3D reconstruction technology as the research object, with the goal of improving the reconstruction completeness and overall efficiency, focusing on the optimization of multi-view feature extraction and the lightweighting of 3D Convolutional Neural Network (3D CNN) structure to carry out an in-depth analysis and research in the following aspects: (1) Taking the CasMVSNet network as a benchmark, this thesis proposes a multi-scale feature extraction network that combines a parallel convolution-attention block with a feature aggregation module. In the top jump connection of the feature pyramid network, a parallel convolution-attention block based on depth-separable convolution and self-attention mechanism is designed to capture local details and global context information by using different types of convolution operations, and at the same time, the feature weights are dynamically adjusted based on the self-attention mechanism, so as to realize efficient extraction and fusion of multiscale features. In addition, a feature fusion module based on the channel attention mechanism is added at the end of the feature pyramid network to highlight the key features, so as to capture and utilize the effective feature information more accurately and provide higher quality inputs for the regularization stage. The experimental results show that on the DTU dataset, the completeness and wholeness of the reconstructed point cloud are improved by 26.23% and 6.48%, respectively, compared with the baseline network, which improves the problems of incomplete reconstruction and poor visualization in complex scenes. On the Tanks and Temples dataset, the average F-score value reaches 62.83, which is 10.54% higher than the benchmark network, showing strong generalization performance. (2) A hybrid recurrent regularization network is proposed to address the problems of high memory occupation and high computational cost when 3D CNNs process high-resolution data. The method combines the advantages of 2D U-Net architecture and recurrent neural network, and adopts a differentiated set of modules for regularization based on the stage characteristics of the reconstruction task. In the initial stage, the Hybrid Unet-ConvLSTMCell module is used for regularization along the depth direction in view of the low image resolution and the large number of depth surfaces to be estimated; in the subsequent optimization stage, the Hybrid Unet-ConvGRU module is used for regularization in view of the high image resolution and the small number of depth surfaces to be estimated. The experimental results show that the completeness of the reconstructed point cloud on the DTU dataset is improved by 26.49% compared with the baseline network, and this strategy makes full use of the spatial context information to ensure the reconstruction effect while significantly reducing the number of network parameters and memory consumption, which effectively mitigates the problem of traditional 3D CNN memory consumption. The multi-view 3D reconstruction method based on multi-scale feature extraction and hybrid cyclic regularization proposed in this thesis is able to extract richer depth information, improve the quality of feature representation, effectively suppress edge and background noise, and optimize the reconstruction results by reducing the computational cost and enhancing the adaptability to high-resolution data at the same time. |
参考文献: |
[8]王江安,黄乐,庞大为,等.基于自适应聚合循环递归的稠密点云重建网络[J].图学学报,2024,45(01):230-239. [9]童伟,张苗苗,李东方,等.基于边缘辅助极线Transformer的多视角场景重建[J].电子与信息学报,2023,45(10):3483-3491. [10]王敏,赵明富,宋涛,等.基于特征聚合Transformer的多视图立体重建方法[J].激光与光电子学进展,2024,61(14):181-190. [19]樊铭瑞,申冰可,牛文龙,等.基于深度学习的多视图立体视觉综述[J].软件学报,2025,36(04):1692-1714. [39]鄢化彪,徐方奇,黄绿娥,等.基于深度学习的多视图立体重建方法综述[J].光学精密工程,2023,31(16):2444-2464. [41]陈暄, 吴吉义. 基于优化卷积神经网络的车辆特征识别算法研究[J]. 电信科学, 2023, 39(10): 101-111. [43]朱光照,韦博,杨阿峰,等.基于自注意力机制的多视图三维重建方法[J].激光与光电子学进展,2023,60(16):323-330. [55]孙凯,张成,詹天,等.融合注意力机制和多层动态形变卷积的多视图立体视觉重建方法[J].兵工学报,2024,45(10):3631-3641. |
中图分类号: | TP391.4 |
开放日期: | 2025-06-16 |