查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于YOLOv3的目标检测算法研究
姓名：	李孔
学号：	18207041016
保密级别：	公开
论文语种：	chi
学科代码：	081001
学科名称：	工学 - 信息与通信工程 - 通信与信息系统
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2021
培养单位：	西安科技大学
院系：	通信与信息工程学院
专业：	通信与信息系统
研究方向：	数字图像处理
第一导师姓名：	李明明
第一导师单位：	西安科技大学
论文提交日期：	2021-06-18
论文答辩日期：	2021-06-04
论文外文题名：	Research on Object Detection Algorithm Based on YOLOv3
论文中文关键词：	目标检测 ; YOLOv3 ; 注意力 ; 数据增强 ; 模型压缩
论文外文关键词：	Object detection ; YOLOv3 ; Attention ; Data augmentation ; Model compression
论文中文摘要：	︿目标检测是图像分析与理解等大量高层次视觉任务的基础，YOLOv3作为目标检测技术中最受欢迎的算法之一，由于具有良好的泛化能力，应用广泛。为了满足终端设备高检测精度、低内存占用的算法部署需求，本文从提高检测精度、压缩模型参数两个方面对目标检测算法进行研究，最终以YOLOv3为基准对其进行改进。在提高检测精度方面，从可嵌入到模型中的注意力方法和模型训练时使用的数据增强方法两个角度入手。针对注意力，结合YOLOv3算法的检测原理，将锚框信息作为先验知识引入到注意力方法中，得到改进的注意力方法。对比实验表明，使用了改进的注意力方法的YOLOv3算法相比于原始的YOLOv3算法在VOC2007和VOC2012数据集上分别有0.7%和0.4%的mAP提升。针对数据增强，将MixUp和Mosaic两种方法结合得到改进的数据增强方法。为验证改进方法的通用效果，使用基准网络PyramidNet在CIFAR-100和CIFAR-10数据集上进行对比实验，结果表明，模型训练时使用改进的数据增强方法之后，Top-1错误率相比于未使用之前分别降低2.47%和1.31%。在模型压缩方面，提出将零激活占比大于阈值的卷积核过滤掉之后再进行FPGM剪枝的改进策略。为验证改进方法的通用效果，以ResNet-56为基准网络在CIFAR-10数据集上进行对比实验，结果表明，ResNet-56使用改进的FPGM剪枝策略后，相比于未剪枝之前FLOPs减少53%左右，准确率相比于未剪枝之前提高0.05%左右。综合以上提出的改进方法，在MS COCO数据集上进行对比实验。结果表明，YOLOv3算法网络结构中添加改进的注意力方法，训练时使用改进的数据增强方法，得到推理模型后再用改进的FPGM剪枝策略进行剪枝，最终AP可达到37.9%，相比于未改进之前提高6.9%，网络参数量由59.6M降低到33.1M。改进的YOLOv3算法相比其他性能优异的算法虽有一些差距，但相比YOLOv3本身仍有较大的性能提升。﹀
论文外文摘要：	︿ Object detection is the basis of a large number of high-level vision tasks such as image analysis and understanding. As one of the most popular algorithms in object detection technology, YOLOv3 has a wide range of applications due to its good generalization ability. In order to meet the requirements of algorithm deployment for terminal equipment with high detection accuracy and low memory usage, this dissertation studies the object detection algorithm from two aspects: improving detection accuracy and compressing model parameters, and finally improves it based on YOLOv3. In terms of improving the detection accuracy, we start from two perspectives: the attention method that can be embedded in the model and the data augmentation method used in model training. For attention, combined with the detection principle of the YOLOv3 algorithm, the anchor frame information is introduced into the attention method as prior knowledge, and an improved attention method is obtained. Comparative experiments show that compared with the original YOLOv3 algorithm, the YOLOv3 algorithm using the improved attention method has 0.7% and 0.4% mAP improvements on the VOC2007 and VOC2012 data sets, respectively. For data augmentation, the two methods of MixUp and Mosaic are combined to get an improved data augmentation method. In order to verify the general effect of the improved method, the benchmark network PyramidNet is used to conduct comparative experiments on the CIFAR-100 and CIFAR-10 data sets. The results show that after the improved data augmentation method is used during model training, the Top-1 error rate is compared to Before use, it was reduced by 2.47% and 1.31%, respectively. In terms of model compression, an improved strategy is proposed to filter out the convolution kernels with a proportion of zero activations greater than the threshold before performing FPGM pruning. In order to verify the general effect of the improved method, a comparative experiment was conducted on the CIFAR-10 data set with ResNet-56 as the benchmark network. The results show that after ResNet-56 uses the improved FPGM pruning strategy, FLOPs are reduced by about 53% compared to before pruning, and the accuracy rate is increased by about 0.05% compared with that before pruning. Based on the improved methods proposed above, comparative experiments are carried out on the MS COCO data set. The results show that the improved attention method is added to the network structure of the YOLOv3 algorithm, and the improved data augmentation method is used during training. After the inference model is obtained, the improved FPGM pruning strategy is used for pruning. The final AP can reach 37.9%, compared with Before improvement, it increased by 6.9%, and the amount of network parameters was reduced from 59.6M to 33.1M. Although the improved YOLOv3 algorithm has some gaps compared with other algorithms with excellent performance, it still has a large performance improvement compared to YOLOv3 itself. ﹀
参考文献：	︿ [1] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Process, 2012, 25(02): 1097-1105. [2] Lowe D G. Distinctive Image Features from Scale-Invariant Keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110. [3] Ke Y, Sukthankar R. PCA-SIFT: A more distinctive representation for local image descriptors[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2004:506-513. [4] Lienhart R, Maydt J. An extended set of haar-like features for rapid object detection[C]//Proceedings of the IEEE International Conference on Image Processing. 2002: 900-903. [5] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2005: 886-893. [6] Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2008: 1-8. [7] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2014: 580-587. [8] Girshick R . Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV). 2015: 1440-1448. [9] Ren S, He K , Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. [10] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV). 2017: 2961-2969. [11] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]// Proceedings of the European Conference on Computer Vision(ECCV). 2016: 21-37. [12] Fu C Y, Liu W, Ranga A, et al. Dssd: Deconvolutional single shot detector[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2017:2301-2312. [13] Shen Z, Liu Z, Li J, et al. Dsod: Learning deeply supervised object detectors from scratch[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV). 2017: 1919-1927. [14] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 779-788. [15] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2017: 7263-7271. [16] Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. Optics Laser Technology, 2018, 23(03):44-57. [17] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. [18] Kampffmeyer M, Salberg A B, Jenssen R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 1-9. [19] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [20] Krizhevsky A，Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Process, 2012, 25(02): 1097-1105. [21] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2014:14-28. [22] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2015: 1-9. [23] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 770-778. [24] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2017: 4700-4708. [25] Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. International Journal of Computer Vision, 2016, 21(05):938-956. [26] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. International Journal of Computer Vision, 2017, 5(08):122-131. [27] Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2018: 6848-6856. [28] Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2019: 658-666. [29] Zheng Z, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence(AAAI). 2020, 34(07): 12993-13000. [30] Neubeck A, Van Gool L. Efficient non-maximum suppression[C]//Proceedings of the IEEE International Conference on Pattern Recognition. 2006: 850-855. [31] Bodla N, Singh B, Chellappa R, et al. Soft-NMS-improving object detection with one line of code[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV). 2017: 5561-5569. [32] Liu S, Huang D, Wang Y. Adaptive nms: Refining pedestrian detection in a crowd[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2019: 6459-6468. [33] Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//Proceedings of the European Conference on Computer Vision(ECCV). 2014: 740-755. [34] Buslaev A, Iglovikov V I, Khvedchenya E, et al. Albumentations: fast and flexible image augmentations[J]. Information, 2020, 11(2): 125. [35] Zhang H, Cisse M, Dauphin Y N, et al. mixup: Beyond empirical risk minimization[C]//International Conference on Learning Representations(ICLR). 2017:866-874. [36] Yun S, Han D, Oh S J, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV). 2019: 6023-6032. [37] 徐诚极, 王晓峰, 杨亚东. Attention-YOLO:引入注意力机制的YOLO检测算法[J]. 计算机工程与应用, 2019, 55(06):13-23+125. [38] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2018: 7132-7141. [39] Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV). 2018: 3-19. [40] Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[J]. Computer Engineering, 2015, 40(05):1-6. [41] Lebedev V, Ganin Y, Rakhuba M, et al. Speeding-up convolutional neural networks using fine-tuned cp-decomposition[C]//International Conference on Learning Representations(ICLR). 2015:6553-6564. [42] Hubara I, Courbariaux M, Soudry D, et al. Binarized neural networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016: 4114-4122. [43] Romero A, Ballas N, Kahou S E, et al. Fitnets: Hints for thin deep nets[C]//International Conference on Learning Representations(ICLR). 2017:124-133. [44] Elsken T, Metzen J H, Hutter F. Neural architecture search: A survey[J]. Journal of Machine Learning Research, 2019, 20(55): 1-21. [45] Crowley E J, Turner J, Storkey A, et al. A closer look at structured pruning for neural network compression[J]. Eprint Arxiv, 2018:1-12. [46] Ye J, Lu X, Lin Z, et al. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers[J]. Eprint Arxiv, 2018:1-11. [47] Wen W, Wu C, Wang Y, et al. Learning structured sparsity in deep neural networks[C]//Proceedings of the International Conference on Neural Information Processing Systems (NIPS). 2016:2074-2082. [48] Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV). 2017: 2736-2744. [49] Hu H, Peng R, Tai Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures[J]. Journal of MachineLearning Research, 2016, 11(01):43-46. [50] Molchanov P, Tyree S, Karras T, et al. Pruning convolutional neural networks for resource efficient inference[J]. Computer Science, 2016, 12(01):457-561. [51] Luo J H, Wu J, Lin W. Thinet: A filter level pruning method for deep neural network compression[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV). 2017: 5058-5066. [52] He Y, Liu P, Wang Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2019: 4340-4349. [53] Liu Z, Sun M, Zhou T, et al. Rethinking the value of network pruning[C]//International Conference on Learning Representations(ICLR). 2019:1-21. ﹀
中图分类号：	TP391
开放日期：	2021-06-18

附件下载