论文中文题名: | 复杂场景下目标跟踪算法研究 |
姓名: | |
学号: | 20208049012 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 081203 |
学科名称: | 工学 - 计算机科学与技术(可授工学、理学学位) - 计算机应用技术 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 计算机视觉 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2023-06-14 |
论文答辩日期: | 2023-06-06 |
论文外文题名: | Research on target algorithm in complex scenes |
论文中文关键词: | 目标跟踪 ; 相关滤波 ; 自适应时空正则化 ; 空洞卷积 ; Transformer |
论文外文关键词: | Targettracking ; Adaptivespatio-temporalregularization ; Correlationfiltering |
论文中文摘要: |
近年来目标跟踪技术虽取得了快速进展,但是在复杂场景下,目标往往同时存在尺度变化、形变、遮挡以及背景杂乱等多种干扰因素,影响跟踪算法的准确性和鲁棒性。因此,研究复杂场景下目标跟踪算法具有十分重要的意义。本文以相关滤波和深度学习理论为基础,根据不同的应用需求,设计了两种目标跟踪算法。主要的工作内容如下: (1)相关滤波跟踪算法在目标发生尺度变化和遮挡等情况时无法动态调整跟踪框尺度以及缺少遮挡判断机制导致跟踪准确性不高。针对以上问题,本文提出了一种自适应时空正则化的抗遮挡目标跟踪算法。根据图像深度和尺度之间存在强关联性,建立深度-尺度估计模型,利用目标深度值估计尺度值,实现尺度自适应跟踪;然后联合平均峰值相关能量和最大响应峰值对目标进行遮挡判断,当目标发生遮挡时,采用卡尔曼滤波对目标重定位;最后,引入空间正则项和自适应时间正则项训练相关滤波器,进一步提升算法的精度,采用交替方向乘子算法对滤波器快速求解,保证算法的实时性能。通过在OTB-100数据集上进行对比分析,实验结果表明,跟踪精度达到了71.6%,成功率达到了54.2%。与主流算法相比,算法在尺度自适应变化的基础上具备良好的抗遮挡能力。 (2)孪生网络目标跟踪算法在目标发生尺度变化、形变、遮挡和处于背景杂乱等更复杂的情况时网络缺乏对目标全局建模的能力,导致算法跟踪鲁棒性不高。针对以上问题,本文提出了一种基于全局上下文的多尺度目标跟踪算法。首先,采用设计的MF-SwinTransformer特征提取网络对输入图像实现全局上下文的特征提取;并融合多个空洞卷积对特征图采样,使网络能够关注到目标的多尺度信息;之后,在Transformer编码器中引入联合位置编码层,动态生成长度自适应的位置编码,实现对目标位置的精准定位;最后,通过分类和回归实现目标跟踪任务。在GOT-10k和LaSOT数据集上分别进行对比分析,实验结果表明,跟踪精度达到了64.5%,成功率达到了65.8%,与主流算法相比,算法在尺度变化、形变、遮挡和背景杂乱等复杂场景下展现出了良好的跟踪效果。 |
论文外文摘要: |
In recent years, although target tracking has made rapid progress, it often faces various interfering factors in complex scenes, such as scale changes, deformations, occlusions, and cluttered backgrounds, which affect the accuracy and robustness of tracking algorithms. Therefore, studying target tracking algorithms in complex scenes is of great significance. Based on correlation filtering and deep learning theory, this paper designs two target tracking algorithms according to different application requirements. The main contents of this work are as follows: ( 1) The correlation filtering tracking algorithm cannot dynamically adjust the tracking box scale and lacks occlusion detection mechanisms, resulting in low tracking accuracy in cases of scale changes and occlusions. To address these issues, this paper proposes an occlusion-resistant target tracking algorithm with adaptive spatio-temporal regularization. By establishing a depth-scale estimation model based on the strong correlation between image depth and scale, the algorithm estimates the scale value using the estimated depth value of the target, achieving scale-adaptive tracking. Then, by jointly utilizing the average peak correlation energy and the maximum response peak, the algorithm detects occlusions. When an occlusion occurs, the algorithm employs a Kalman filter to reposition the target. Finally, spatial regularization terms and adaptive temporal regularization terms are introduced to train the correlation filters, further improving the algorithm's accuracy. The alternating direction method of multipliers is used for fast filter solving, ensuring real-time performance. Comparative analysis on the OTB- 100 dataset shows that the tracking precision reaches 71.6%, and the success rate reaches 54.2%. Compared with mainstream algorithms, the proposed algorithm exhibits good occlusion resistance based on scale adaptive changes. (2) The Siamese network-based target tracking algorithm using convolutional neural networks for feature extraction tends to lose fine-grained target details in more complex scenarios where scale changes, deformations, occlusions, and cluttered backgrounds occur. Additionally, the network lacks the ability to globally model the target, leading to low tracking robustness. To address these issues, this paper proposes a multi-scale target tracking algorithm based on global context. Firstly, an MF-Swin Transformer feature extraction network is designed to extract global contextual features from input images. Multiple dilated convolutions are fused to sample feature maps, allowing the network to focus on multi-scale information of the target. Then, a joint position encoding layer is introduced into the Transformer encoder to dynamically generate lengthadaptive position encodings for precise target localization. Finally, target tracking is achieved through classification and regression. Comparative analysis on the GOT- 10k and LaSOT benchmarks shows that the tracking precision reaches 64.5%, and the success rate reaches 65.8%. Compared with mainstream algorithms, the proposed algorithm demonstrates good tracking performance in complex scenarios involving scale changes, deformations, occlusions, and cluttered backgrounds. |
中图分类号: | TP391 |
开放日期: | 2023-06-14 |