论文中文题名: | 视频动态目标消除方法研究 |
姓名: | |
学号: | 21208223056 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085400 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2021 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 计算机视觉 |
第一导师姓名: | |
第一导师单位: | |
第二导师姓名: | |
论文提交日期: | 2024-12-12 |
论文答辩日期: | 2024-12-03 |
论文外文题名: | Research on Video Dynamic Object Elimination Method |
论文中文关键词: | 动态目标定位 ; 视频分割 ; 视频空洞修复 ; 共享动态掩码 ; 多尺度信息聚合动态目标定位 ; 视频分割 ; 视频空洞修复 ; 共享动态掩码 ; 多尺度信息聚合 |
论文外文关键词: | Dynamic target recognition ; video segmentation ; video hole repair ; shared dynamic masks ; multi-scale information aggregation |
论文中文摘要: |
视频动态目标消除任务是先对动态目标定位,然后对目标区域修复。动态目标的定位需要对动态目标做分割和运动检测,目标区域修复是利用静态背景修复目标区域,从而获得仅包含静态背景的完整图像帧。该技术在视频编辑、增强现实、三维重建和动态SLAM等领域中有着广泛的应用。 主要研究内容如下: 视频动态目标定位方法。现有的视频修复方法通常只修复掩码指定区域,无法自动定位并修复视频中所有动态目标区域。本文提出了一种视频动态目标定位方法,该方法包含目标分割与运动检测,实现对动态目标的自动定位,为后续的修复工作奠定了基础。为了提升对潜在动态目标的分割精度,在分割算法中引入特征金字塔和PSA注意力机制,实现了不同尺度上远程上下文信息的聚合,提升了对目标的分割的精度,进而准确分割出潜在的动态目标。随后采用卡尔曼滤波检测目标的运动状态,实现了对动态目标的定位。在YouTube VOS 2019和DAVIS 2017数据集上进行了实验与对比,相较于CompFeat等方法本文提出的动态目标定位方法可以准确地定位视频中的动态目标,在AP、AP50、AP75上分别有5.5%、3.7%和7.9%的提升,效果优于主流算法。 多尺度信息聚合修复方法。目前修复方法在处理背景变化显著、动态目标发生严重形变或复杂运动场景视频时,易出现纹理模糊或结构扭曲等问题。本文提出了一种多尺度信息聚合的修复算法。该方法使用生成式对抗网络的结构,通过区域归一化和门控卷积构建的生成器,有效解决了均值和方差偏移问题,提高了修复精度。利用改进的Inception模块的多尺度特征提取能力,在多个尺度上进行卷积和特征聚合,从而在填补大面积的空洞时保持纹理清晰。解码器采用Mish和ELU激活函数,减少了信息丢失,增强了网络的泛化性能。在YouTube VOS 2019、DAVIS 2017和Paris Street View等数据集上的实验结果表明,相较于STTN等算法,本文方法在大面积空洞的修复上纹理清晰,没有明显结构扭曲现象,且在PSNR指标上最高有14.08%的提升,在SSIM指标上最高有3.4%的提升。 视频动态目标消除系统。设计并实现了一个视频动态目标消除系统。该系统结合了本文提出的两种方法,成功实现了对动态目标的精准定位与有效消除。这些消除结果可以为三维重建或动态SLAM等领域提供基础支持。此外,系统还配备了用户友好的可视化界面,方便用户进行操作和分析。 |
论文外文摘要: |
Video dynamic object elimination refers to the automatic localization of dynamic objects in a video and the use of static backgrounds to repair the target area, thereby obtaining complete image frames containing only static backgrounds. This technology has a wide range of applications in fields such as video editing, augmented reality, 3D reconstruction, and dynamic SLAM. This paper conducts in-depth research on dynamic target elimination methods. The main research contents are as follows: (1) Video dynamic object localization method. The existing video restoration methods usually fix mask specified areas and cannot automatically locate all dynamic targets in the video. This article proposes a video dynamic object localization method that integrates the processes of segmentation and motion detection, achieving automatic localization of dynamic objects and laying the foundation for subsequent repair work. By introducing attention mechanisms, the aggregation of remote contextual information has been achieved, improving the accuracy of segmenting irregular non rigid objects and accurately segmenting potential dynamic targets. Subsequently, Kalman filtering was used to detect the motion state of the target, achieving accurate recognition of dynamic targets. Experiments and comparisons were conducted on the YouTube VOS 2019 and DAVIS 2017 datasets. Compared with CompFeat and other methods, the dynamic object localization method proposed in this paper can accurately locate dynamic objects in videos, with improvements of 5.5%, 3.7%, and 7.9% on AP, AP50, and AP75, respectively, outperforming mainstream algorithms. (2) Multi scale information aggregation repair method. At present, when dealing with videos with significant background changes, severe deformation of dynamic targets, or complex motion scenes, repair methods are prone to problems such as texture blur or structural distortion. This paper proposes a multi-scale information aggregation repair algorithm. This method uses the structure of a generative adversarial network and constructs a generator through region normalization and gated convolution, effectively solving the problem of mean and variance shift and improving the repair accuracy. By utilizing the multi-scale feature extraction capability of the improved Inception module, convolution and feature aggregation are performed at multiple scales to maintain clear texture while filling large areas of voids. The decoder uses Mish and ELU activation functions to reduce information loss and enhance the network's generalization performance. The experimental results on datasets such as YouTube VOS 2019, DAVIS 2017, and Paris Street View show that compared to algorithms such as STTN, our proposed method has clear texture and no obvious structural distortion in repairing large-area holes. Moreover, it shows a maximum improvement of 14.08% in PSNR and 3.4% in SSIM metrics. (3)Video dynamic object elimination system. Designed and implemented a video dynamic object elimination system. The system combines the two methods proposed in this article and successfully achieves precise positioning and effective elimination of dynamic targets. These elimination results can provide fundamental support for fields such as 3D reconstruction or dynamic SLAM. In addition, the system is equipped with a user-friendly visual interface, which facilitates user operation and analysis. |
参考文献: |
[1]蔡显奇,王晓松,李玮.一种室内弱纹理环境下的视觉SLAM算法[J].机器人, 2024, 46(3):284. [2]汪水源,侯志强,李富成,等.自适应权重更新的轻量级视频目标分割算法[J].中国图象图形学报,2023,28(12):3772-3783. [3]彭进业, 余喆, 屈书毅等.基于深度学习的图像修复方法研究综述[J]. 西北大学学报 (自然科学版), 2024, 53(6): 943-963. [11]袁春兰,熊宗龙,周雪花,等.基于Sobel算子的图像边缘检测研究[J].激光与红外, 2009, 39(1):3. [12]姚智超,楚晓亮,范筠益,等.基于Prewitt算子的X波段雷达有效波高反演研究[J].系统工程与电子技术, 2022, 44(4):1182-1187. [26]刘思, 杨程方. 一种融合背景差分和帧间差分的运动目标检测方法[J]. 舰船电子工程, 2024, 44(2): 45-48. [27]舒兆翰, 李小龙, 吴从辉. 融合两帧差分法的改进视觉背景提取算法[J]. 科学技术与工程, 2024, 24(11): 04618-08. [29]邱道尹, 张文静, 顾波, 等. 帧差法在运动目标实时跟踪中的应用[J]. 华北水利水电学院学报, 2009 (3): 45-46. [30]刘仲民, 何胜皎, 胡文瑾. 基于 Σ-Δ 背景估计的运动目标检测算法[J]. 计算机工程与设计, 2019, 40(3): 788-794. [2]汪水源,侯志强,李富成,等.自适应权重更新的轻量级视频目标分割算法[J].中国图象图形学报,2023,28(12):3772-3783. [3]彭进业, 余喆, 屈书毅等.基于深度学习的图像修复方法研究综述[J]. 西北大学学报 (自然科学版), 2024, 53(6): 943-963. [11]袁春兰,熊宗龙,周雪花,等.基于Sobel算子的图像边缘检测研究[J].激光与红外, 2009, 39(1):3. [12]姚智超,楚晓亮,范筠益,等.基于Prewitt算子的X波段雷达有效波高反演研究[J].系统工程与电子技术, 2022, 44(4):1182-1187. [26]刘思, 杨程方. 一种融合背景差分和帧间差分的运动目标检测方法[J]. 舰船电子工程, 2024, 44(2): 45-48. [27]舒兆翰, 李小龙, 吴从辉. 融合两帧差分法的改进视觉背景提取算法[J]. 科学技术与工程, 2024, 24(11): 04618-08. [29]邱道尹, 张文静, 顾波, 等. 帧差法在运动目标实时跟踪中的应用[J]. 华北水利水电学院学报, 2009 (3): 45-46. [30]刘仲民, 何胜皎, 胡文瑾. 基于 Σ-Δ 背景估计的运动目标检测算法[J]. 计算机工程与设计, 2019, 40(3): 788-794. |
中图分类号: | TP391.4 |
开放日期: | 2024-12-13 |