- 无标题文档
查看论文信息

论文中文题名:

 面向安防系统的弱监督货物图像分类算法研究    

姓名:

 杨晨    

学号:

 21208088021    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 083500    

学科名称:

 工学 - 软件工程    

学生类型:

 硕士    

学位级别:

 工学硕士    

学位年度:

 2024    

培养单位:

 西安科技大学    

院系:

 计算机科学与技术学院    

专业:

 软件工程    

研究方向:

 图像识别    

第一导师姓名:

 李爱国    

第一导师单位:

  西安科技大学    

论文提交日期:

 2024-06-16    

论文答辩日期:

 2024-05-31    

论文外文题名:

 Research on Weakly Supervised Cargo Image Classification Algorithms for Security Systems    

论文中文关键词:

 深度学习 ; 图像识别 ; 安防系统 ; 多分支网络 ; 模型压缩    

论文外文关键词:

 Deep learning ; Image identification ; Security system ; Multi-branch networks ; Model compression    

论文中文摘要:

       在某型一体化安防系统中,货物的丢失或篡改都可能导致重大损失,货物识别模块面临着实现极高识别准确率的挑战。目前的识别方法受到多方面的限制,包括缺乏充足的货物数据、图像特征利用不足以及实际货物类别之间的差异微小等问题。这些挑战使得现有的识别方法在准确性方面受到了一定程度的限制。为了解决以上问题,本文对某型一体化安防系统下的货物识别算法进行了深入研究。本文的主要工作如下:

       (1) 针对当前某型一体化安防系统中货物识别的挑战,即现有图像识别方法未能充分利用特征,以及货物图像类别之间差异微小导致的识别准确率不足的问题,采用多分支网络的策略,以充分提取图像中隐藏的多粒度特征,提出了AGMG-Net (Attention-Guided Multi-Granularity feature fusion Network)货物识别算法。该算法其中一个分支网络负责提取粗粒度特征,另一个分支网络则通过注意力引导模型提取细粒度特征,再通过特征融合分支网络结合多粒度特征,充分利用图像特征,从而实现高准确率识别。实验结果表明:AGMG-Net货物识别算法在公开数据集Butterfly20、Flower和自建货物数据集Cargo上的平均识别准确率分别达到88.57%、92.73%和99.58%,分类效果优于ResNeSt、CoAtNet和DeepMAD等主流图像识别算法。

       (2) 针对AGMG-Net推理耗时较长、模型参数量和计算量较大,从而导致模型计算效率不高、模型部署困难的问题,提出了AGMG-lite (轻量化AGMG-Net)货物识别算法。该算法通过优化网络结构、调整模型Backbone,从而实现了模型参数的压缩,降低了模型的复杂度,同时引入频域学习方法和样本均衡化方法,既保证了模型的特征学习能力和识别准确率,又显著降低了算法模型的计算量和参数量。实验结果表明,在公开数据集Butterfly20、Flower和自建货物数据集Cargo中,AGMG-lite的识别准确率分别为88.53%、92.16%和99.56%,参数量和计算量仅为78.43M和14. 82GFLOPs,与AGMG-Net相比分别减少了约85%和48%,模型推理速度为4.7毫秒/张,提升约42.3%。实验结果证明,在保持识别准确率基本不变的情况下,AGMG-lite通过显著降低参数量和计算量,为在边缘设备上高效部署提供了支持,展现出很强的适用性。

       本文的工作从分类准确率与算法的参数量和计算量两个方面对某型一体化安防系统下的货物识别算法模型进行了改进和优化,针对存在的问题,建立了相应的网络结构,最后通过实验进行了验证,达到了预期的研究目标。

论文外文摘要:

        In a specific type of integrated security system, the loss or tampering of goods can result in significant financial losses, posing a challenge for achieving high recognition accuracy for goods identification modules. Current recognition methods face several limitations, including a lack of sufficient goods data, insufficient exploitation of image features, and subtle differences between real goods categories. These challenges somewhat restrict the accuracy of existing recognition methods. To address these issues, this study conducts an in-depth investigation into the goods recognition algorithms within this integrated security system. The primary contributions and innovations of this study include:

       (1) To tackle the challenges in goods recognition within the current integrated security system, especially the inadequacy of existing image recognition methods in fully utilizing features and the issue of inadequate recognition accuracy due to subtle differences between goods image categories, we adopt a multi-branch network strategy to thoroughly extract multi-granularity features hidden within images. We propose the AGMG-Net (Attention-Guided Multi-Granularity feature fusion Network) goods recognition algorithm. One branch network of this algorithm is responsible for extracting coarse-grained features, while another branch network employs an attention-guided mechanism to extract fine-grained features. These features are then combined through a feature fusion network to fully leverage image features, thereby achieving high-accuracy recognition. Experimental results demonstrate that the AGMG-Net goods recognition algorithm achieves average recognition accuracies of 88.57%, 92.73%, and 99.58% on the public datasets Butterfly20, Flower, and the self-built Cargo dataset, respectively, superior to mainstream image recognition algorithms such as ResNeSt, CoAtNet, and DeepMAD.

       (2) To address the issues of longer inference times and high model parameter and computation sizes associated with AGMG-Net, leading to low computational efficiency of the model and difficult model deployment, we propose the AGMG-lite (lightweight version of AGMG-Net) goods recognition algorithm. This algorithm achieves model parameter compression and complexity reduction by optimizing network structures and adjusting the model Backbone. Furthermore, it introduces frequency domain techniques and sample balancing techniques to ensure robust feature learning capability and recognition accuracy, while also significantly reducing the computation and parameter sizes of the algorithmic model. Experimental results demonstrate that, on the public datasets Butterfly20, Flower, and the self-built Cargo dataset, AGMG-lite achieves recognition accuracies of 88.53%, 92.16%, and 99.56%, respectively, with model parameter and computation sizes of only 78.43M and 14.82GFLOPs. Compared to AGMG-Net, this represents reductions of approximately 85% and 48% in parameter and computation sizes, respectively, with a model inference speed of 47 milliseconds per image, representing a 42.3% improvement. Experimental results confirm that while maintaining the recognition accuracy, AGMG-lite significantly reduces parameter and computation sizes, thereby providing robust support for efficient deployment to edge devices and demonstrating strong suitability.

       This study improves and optimizes goods recognition algorithm models within a specific type of integrated security system from the perspectives of classification accuracy and algorithmic parameter and computation sizes. Addressing existing issues, relevant network structures are established and validated through experimentation, achieving the intended research objectives.

参考文献:

[1] 刘颖, 雷研博, 范九伦, 等. 基于小样本学习的图像分类技术综述[J]. 自动化学报, 2021, 47(2): 297-315.

[2] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7): 1527-1554.

[3] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25(2).

[4] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.

[5] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

[6] 丁玲, 丁世飞, 张健, 等. 使用VGG能量损失的单图像超分辨率重建[J]. 软件学报, 2021, 32(11): 3659-3668.

[7] 黄欢, 孙力娟, 曹莹, 等. 基于注意力的短视频多模态情感分析[J]. 图学学报, 2021, 42(01): 8-14.

[8] Xie X, Cheng G, Wang J, et al. Oriented R-CNN for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 3520-3529.

[9] Wang J, Yang W, Li H C, et al. Learning center probability map for detecting objects in aerial images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 59(5): 4307-4323.

[10] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.

[11] Huang X, Wang X, Lv W, et al. PP-YOLOv2: A practical object detector[J]. arXiv preprint arXiv:2104.10419, 2021.

[12] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 7464-7475.

[13] Qian W, Yang X, Peng S, et al. Learning modulated loss for rotated object detection[C]//Proceedings of the AAAI conference on artificial intelligence. 2021, 35(3): 2458-2466.

[14] 邱天衡, 王玲, 王鹏, 等. 基于改进YOLOv5的目标检测算法研究[J]. 计算机工程与应用, 2022,58(13): 63-73.

[15] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.

[16] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

[17] Wu Y, Kan S, Zeng M, et al. Singularformer: learning to decompose self-attention to linearize the complexity of transformer[C]//Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. 2023: 4433-4441.

[18] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[C]//International Conference on Learning Representations. 2020.

[19] Zheng D, Dong W, Hu H, et al. Less is more: Focus attention for efficient detr[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 6674-6683..

[20] 楼哲航, 罗素云. 基于YOLOX和Swin Transformer的车载红外目标检测[J]. 红外技术. 2022, 44(11): 1167-75.

[21] Zheng S, Lu J, Zhao H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 6881-6890.

[22] Chen H, Wang Y, Guo T, et al. Pre-trained image processing transformer[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 12299-12310.

[23] Zhu L, Wang X, Ke Z, et al. Biformer: Vision transformer with bi-level routing attention[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 10323-10333.

[24] 石德硕, 李军侠, 刘青山. 自注意力融合调制的弱监督语义分割[J]. 中国图象图形学报, 2023, 28(12): 3758-3771.

[25] 杨大伟, 迟津生, 毛琳. 基于边界辅助的弱监督语义分割网络[J]. 计算机应用研究, 2024, 41 (2): 623-628+634.

[26] Ru L, Zhan Y, Yu B, et al. Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 16846-16855.

[27] Chen Z, Wang T, Wu X, et al. Class re-activation maps for weakly-supervised semantic segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 969-978.

[28] Zemmal N, Azizi N, Sellami M, et al. Particle swarm optimization based swarm intelligence for active learning improvement: Application on medical data classification[J]. Cognitive Computation, 2020, 12: 991-1010.

[29] Liu F, Tian Y, Chen Y, et al. Acpl: Anti-curriculum pseudo-labelling for semi-supervised medical image classification[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 20697-20706.

[30] Hwang D, Ha J W, Shim H, et al. Entropy regularization for weakly supervised object localization[J]. Pattern Recognition Letters, 2023, 169: 1-7.

[31] Zhang L, Yang H. Adaptive attention augmentor for weakly supervised object localization[J]. Neurocomputing, 2021, 454: 474-482.

[32] Gao W, Wan F, Pan X, et al. Ts-cam: Token semantic coupled attention map for weakly supervised object localization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 2886-2895.

[33] Swaminathan S, Garg D, Kannan R, et al. Sparse low rank factorization for deep neural network compression[J]. Neurocomputing, 2020, 398: 185-196.

[34] 张云鹏, 周浦城, 薛模根. 基于张量低秩分解和非下采样剪切波变换的视频图像去雪方法[J]. 图学学报, 2023, 44(5): 947-954.

[35] 张帆, 黄赟, 方子茁, 等. 卷积神经网络的损失最小训练后参数量化方法[J]. 通信学报, 2022, 43(4): 114-122.

[36] Alkhulaifi A, Alsahli F, Ahmad I. Knowledge distillation in deep learning and its applications[J]. PeerJ Computer Science, 2021, 7: e474.

[37] Guo Q, Wu X J, Kittler J, et al. Differentiable neural architecture learning for efficient neural networks[J]. Pattern recognition, 2022, 126: 108448.

[38] Heuillet A, Nasser A, Arioui H, et al. Efficient automation of neural network design: A survey on differentiable neural architecture search[J]. arXiv preprint arXiv:2304.05405, 2023.

[39] Gao Y, Zhang B, Qi X, et al. DPACS: hardware accelerated dynamic neural network pruning through algorithm-architecture co-design[C]//Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 2023: 237-251.

[40] 王国栋, 邵鹏, 王国宇, 等. 基于低秩分解与像素置乱的图像去雾方法[J]. 计算机工程, 2022, 48(12): 212-217.

[41] 邱妍妍, 高增. 基于低秩分解的异常步态活动图像序列识别[J]. 计算机仿真, 2021, 38(6): 415-418.

[42] 赵旭剑, 李杭霖. 基于混合机制的深度神经网络压缩算法[J]. 计算机应用, 2023, 43(9): 2686-2691.

[43] 韩晶晶, 刘江越, 公维军, 等. 面向移动端的目标检测优化研究[J]. 计算机工程与应用, 2022, 58(24): 12-28.

[44] Sun J, Liu Z, Wen J, et al. Multiple hierarchical compression for deep neural network toward intelligent bearing fault diagnosis[J]. Engineering Applications of Artificial Intelligence, 2022, 116: 105498.

[45] Wang L, Yoon K J. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 44(6): 3048-3068.

[46] Hinton G, Vinyals O, Dean J. Distilling the Knowledge in a Neural Network[J]. stat, 2015, 1050: 9.

[47] Ba L J, Caruana R. Do deep nets really need to be deep?[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. 2014: 2654-2662.

[48] Yang C, Xie L, Su C, et al. Snapshot distillation: Teacher-student optimization in one generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2859-2868.

[49] Wen T, Lai S, Qian X. Preparing lessons: Improve knowledge distillation with better supervision[J]. Neurocomputing, 2021, 454: 25-33.

[50] Qu H, Su X, Wang Y, et al. Noise-separated adaptive feature distillation for robust speech recognition[J]. IEEE Signal Processing Letters, 2023.

[51] Passalis N, Tefas A. Learning deep representations with probabilistic knowledge transfer[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 268-284.

[52] 贺驰原, 程少旭, 许林峰, 等. 一种基于差异性特征蒸馏的多模态连续学习方法[J]. 北京航空航天大学学报, 2024.

[53] Lassance C, Bontonou M, Hacene G B, et al. Deep geometric knowledge distillation with graphs[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020: 8484-8488.

[54] Park W, Kim D, Lu Y, et al. Relational knowledge distillation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 3967-3976.

[55] Yu L, Li Y, Weng S, et al. Adaptive multi-teacher softened relational knowledge distillation framework for payload mismatch in image steganalysis[J]. Journal of Visual Communication and Image Representation, 2023, 95: 103900.

[56] Yuan L, Tay F E H, Li G, et al. Revisiting knowledge distillation via label smoothing regularization[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 3903-3911.

[57] Yang C, Xie L, Su C, et al. Snapshot distillation: Teacher-student optimization in one generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2859-2868.

[58] Mirzadeh S I, Farajtabar M, Li A, et al. Improved knowledge distillation via teacher assistant[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(04): 5191-5198.

[59] Liu B, Hu B B, Zhao M, et al. Model Compression Algorithm via Reinforcement Learning and Knowledge Distillation[J]. Mathematics, 2023, 11(22): 4589.

[60] Liu Y, Cao J, Li B, et al. Learning to explore distillability and sparsability: a joint framework for model compression[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(3): 3378-3395.

[61] Rui L, Yang S, Chen S, et al. Smart network maintenance in an edge cloud computing environment: An adaptive model compression algorithm based on model pruning and model clustering[J]. IEEE Transactions on Network and Service Management, 2022, 19(4): 4165-4175.

[62] Zhang C, Li C, Guo B, et al. Neural Network Compression via Low Frequency Preference[J]. Remote Sensing, 2023, 15(12): 3144.

[63] 高媛媛, 余振华, 杜方, 等. 基于贝叶斯优化的无标签网络剪枝算法[J]. 计算机应用, 2023, 43(1): 30.

[64] Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016.

[65] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

[66] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.

[67] Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1314-1324.

[68] Zhou B, Khosla A, Lapedriza A, et al. Learning deep features for discriminative localization[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2921-2929.

[69] Gao Y, Liu J, Li W, et al. Augmented Grad-CAM++: Super-Resolution Saliency Maps for Visual Interpretation of Deep Neural Network[J]. Electronics, 2023, 12(23): 4846.

[70] Jiang P T, Han L H, Hou Q, et al. Online attention accumulation for weakly supervised semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(10): 7062-7077.

[71] Vincent L, Soille P. Watersheds in digital spaces: an efficient algorithm based on immersion simulations[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1991, 13(06): 583-598.

[72] Dai Z, Liu H, Le Q V, et al. Coatnet: Marrying convolution and attention for all data sizes[J]. Advances in neural information processing systems, 2021, 34: 3965-3977.

[73] Zhu Y, Min W, Jiang S. Attribute-guided feature learning for few-shot image recognition[J]. IEEE Transactions on Multimedia, 2020, 23: 1200-1209.

[74] Zeng X, Wu W, Tian G, et al. Deep superpixel convolutional network for image recognition[J]. IEEE Signal Processing Letters, 2021, 28: 922-926.

[75] Yi Y K, Zhang Y, Myung J. House style recognition using deep convolutional neural network[J]. Automation in Construction, 2020, 118: 103307.

[76] Koyun O C, Keser R K, Akkaya I B, et al. Focus-and-Detect: A small object detection framework for aerial images[J]. Signal Processing: Image Communication, 2022, 104: 116675.

[77] Wang S, Xu M, Sun Y, et al. Improved single shot detection using DenseNet for tiny target detection[J]. Concurrency and Computation: Practice and Experience, 2023, 35(2): e7491.

[78] Dong X, Qin Y, Fu R, et al. Multiscale deformable attention and multilevel features aggregation for remote sensing object detection[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5.

[79] Ebbinghaus H. Memory: A contribution to experimental psychology[J]. Annals of neurosciences, 2013, 20(4): 155.

[80] Zhang H, Wu C, Zhang Z, et al. Resnest: Split-attention networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 2736-2746.

[81] Shen X, Wang Y, Lin M, et al. Deepmad: Mathematical architecture design for deep convolutional neural network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 6163-6173.

[82] Shen X, Yang J, Wei C, et al. Dct-mask: Discrete cosine transform mask representation for instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 8720-8729.

[83] Wang Q, Wu B, Zhu P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 11534-11542.

[84] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988.

[85] He T, Zhang Z, Zhang H, et al. Bag of tricks for image classification with convolutional neural networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 558-567.

中图分类号:

 TP391    

开放日期:

 2024-06-17    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式