查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于深度学习的复杂场景下帽子检测模型研究
姓名：	邓勇
学号：	20208088021
保密级别：	公开
论文语种：	chi
学科代码：	083500
学科名称：	工学 - 软件工程
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	人工智能与信息处理
第一导师姓名：	罗晓霞
第一导师单位：	西安科技大学
论文提交日期：	2023-06-13
论文答辩日期：	2023-06-06
论文外文题名：	Research on hat detection model in complex scene based on deep learning
论文中文关键词：	帽子检测 ; 自适应卷积 ; 可形变卷积 ; 神经架构搜索
论文外文关键词：	Hat Detection ; Adaptive Convolution ; Deformable Convolution ; Neural Architecture Search
论文中文摘要：	︿近年来，帽子检测作为目标检测研究中的一项重要任务，在工业生产、交通出行以及安全监控等众多领域得到了广泛应用。然而，目前存在的帽子检测数据集目标类别不够丰富、场景单一，无法满足实际复杂场景下的帽子检测需求。当图像中出现场景变化、目标模糊等现象时，现有的检测模型仍然面临一些挑战。为了克服这些问题，本文的主要研究内容和成果如下： (1) 针对目标类别不丰富、检测场景单一的问题，本文通过收集包含交通、工业生产、安防监控等不同场景下的图像数据，同时基于帽子检测相关的公开数据集筛选得到4500张图像，根据实用性标注了7类帽子作为检测对象，将其划分了训练集、验证集和测试集，并命名为HAT4.5K。此外，基于该数据集，对常用一阶段模型YOLOv3、RetinaNet、FCOS，以及两阶段模型Grid R-CNN、Faster R-CNN、Fast R-CNN进行一系列实验。根据实验结果选取了精度较高的Grid R-CNN作为帽子检测模型基准，并对其进一步改进和优化，以解决漏检、误检以及定位和识别不精确的问题。 (2) 针对Grid R-CNN基准模型存在目标漏检和误检的问题，本文基于自适应卷积和自适应采样技术设计了一个多阶段自适应区域建议网络(MA RPN)，并构建了一个多阶段自适应帽子检测模型(MADet)。首先，在MA RPN中通过自适应卷积根据目标区域的形状的尺度调整卷积核的采样位置和形状，从而有效地提取特征；其次，在MA RPN的样本分配阶段采用自适应采样策略动态地分配正负样本；最后，引入Focal Loss引导MA RPN的训练以平衡损失。在HAT4.5K数据集上的消融实验结果表明，MADet模型的检测精度比Grid R-CNN模型提高了6.4%，小尺度目标检测精度提高了5.1%，有效地减少了误检和漏检现象。 (3) 针对MADet模型存在定位和分类不精确的问题，本文提出了改进模型(CMADet)。首先，在骨干网络中融入可形变卷积，提高模型对目标形状特征的提取能力；其次，基于神经架构搜索技术对特征金字塔网络(FPN)中不同层的特征进行跨尺度融合，通过自底向上的融合通道增强了高层特征的位置信息；然后，在RoI Pooling中通过轻量级的可形变卷积学习候选区域特征的几何偏移，实现更加精准的特征对齐，提取到完整的目标候选区域特征。最后，在池化处理时，采用一层卷积操作学习一组池化权重，能够保留特征池化后的重要信息，以增强特征表达。实验结果表明，CMADet模型的检测精度(AP₇₅)相比MADet提高了2.9%，有效地提高了模型的定位和分类性能。﹀
论文外文摘要：	︿ In recent years, hat detection has gained significant attention as a crucial task in object detection research, finding applications in various fields such as manufacturing, traffic management, and public safety monitoring. However, existing hat detection datasets lack diversity in object categories and scene variations, limiting their suitability for detecting hats in complex real-world traffic scenarios. Moreover, current detection models face challenges when confronted with dynamic scene changes and blurred object instances in images. To address these issues, this paper aims to achieve the following research objectives and outcomes: (1) To address the limitations of limited object categories and single detection scenes, this study collects image data from various scenarios, including traffic trips, industrial production, and security monitoring. A dataset comprising 4,500 images is filtered from publicly available datasets related to hat detection. Seven types of hats are labeled as detection objects based on their practicality. The dataset is then divided into training, validation, and testing sets, named as HAT4.5K. In addition, Using the HAT4.5K dataset, a series of experiments are conducted on popular one-stage models such as YOLOv3, RetinaNet, and FCOS, as well as two-stage models including Grid R-CNN, Fast R-CNN, and Faster R-CNN. Among these models, Grid R-CNN, known for its high accuracy, is selected as the benchmark for the hat detection model. Further improvements and optimizations are implemented to address challenges such as missed detections, false detections, and inaccurate positioning and recognition. The proposed model aims to enhance the overall performance of hat detection in diverse and complex real-world scenarios. (2) To address the issues of missed detection and false detection in the Grid R-CNN benchmark model, this paper proposes a Multi-stage Adaptive Regional Proposal Network (MA RPN) based on adaptive convolution and adaptive sampling techniques. Additionally, a Multi-stage Adaptive Hat Detection model (MADet) is constructed. Firstly, in MA RPN, the sampling position and shape of convolution kernel are adjusted according to the scale of the shape of the object area through adaptive convolution, so as to extract features effectively. Secondly, in the sample allocation stage of MA RPN, adaptive sampling strategy dynamically allocates positive and negative samples. Finally, Focal Loss is introduced to guide the training of MA RPN, ensuring balanced losses. Ablation experiments conducted on the HAT4.5K dataset demonstrate that the MADet model achieves a detection accuracy that is 6.4% higher than that of the Grid R-CNN model. Notably, it achieves a 5.1% higher accuracy for small-scale objects, effectively reducing false detections and missed detections. (3) To address the issue of inaccurate localization and classification in the MADet model, this paper proposes an improved model called CMADet. Firstly, deformable convolution is integrated into the backbone network to enhance the model's ability to extract object shape features. Secondly, using neural architecture search, the feature pyramid network (FPN) is improved by fusing features across different scales. This bottom-up channel fusion enhances the positional information of higher-level features. Additionally, lightweight deformable convolution is applied in RoI Pooling to learn geometric offsets of candidate region features, enabling more precise feature alignment and extraction of complete object candidate region features. Finally, during the pooling process, a single convolutional layer learns a set of pooling weights to preserve important information after feature pooling, thus enhancing feature representation. Experimental results demonstrate that the CMADet model achieves a 2.9% improvement in detection accuracy (AP₇₅) compared to MADet, effectively enhancing the model's localization and classification performance. ﹀
参考文献：	︿ [1] 李政谦, 刘晖. 基于深度学习的安全帽佩戴检测算法综述[J]. 计算机应用与软件, 2022, 39(06): 194-202. [2] 关雅琪, 侯群. 一种用于安全帽检测场景的深度学习算法[J]. 电子世界, 2021(22): 86-88. [3] Sanjana S, Shriya V R. A review on various methodologies used for vehicle classification, helmet detection and number plate recognition[J]. Evolutionary Intelligence, 2021, 14(2): 979-987. [4] Espinosa J E, Velastin S A, Branch J W, et al. Detection of motorcycles in urban traffic using video analysis: A review[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(10): 6115-6130. [5] 冉险生, 张之云, 陈卓, 等. 基于改进DeepSORT算法的摩托车头盔佩戴检测[J]. 计算机工程与应用, 2023, 59(05): 194-204. [6] 贾健, 陈晓棠, 黄凯奇, 等. 监控场景中的行人属性识别研究综述[J]. 计算机学报, 2022, 45(08): 1765-1793. [7] 李擎, 胡伟阳, 李江昀, 等. 基于深度学习的行人重识别方法综述[J]. 工程科学学报, 2022, 44(05): 920-932. [8] 王枫, 刘青山. 基于双卷积神经网络的行人精细化识别[J]. 中国科技论文, 2017, 12(14): 1578-1582. [9] Tay C P, Roy S, Yap K H, et al. Aanet: Attribute attention network for person re-identifications[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7134-7143. [10] Zhu J, Liao S, Yi D, et al. Multi-label cnn based pedestrian attribute learning for soft biometrics[C]. Proceedings of the International Conference on Biometrics. 2015: 535-540. [11] 王宇向, 王贞, 吴斌, 等. 智慧工地的安全帽佩戴检测算法研究综述[J]. 武汉理工大学学报, 2021, 43(10): 56-62. [12] Feris R, Bobbitt R, Brown L, et al. Attribute-based people search: Lessons learnt from a practical surveillance system[C]. Proceedings of the International Conference on Multimedia Retrieval. 2014: 153-160. [13] Liu J, Kuipers B, Savarese S, et al. Recognizing human actions by attributes[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011: 3337-3344. [14] Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-339. [15] Everingham M, Eslami S M, Gool L, et al. The pascal visual object classes challenge: A Retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98-136. [16] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context[C]. Proceedings of the European Conference on Computer Vision. 2014: 740-755. [17] Donahue J, Jia Y, Vinyals O, et al. Decaf: A deep convolutional activation feature for generic visual recognition[C]. Proceedings of the International Conference on Machine Learning. 2014: 647-655. [18] Li D, Zhang Z, Chen X, et al. A richly annotated dataset for pedestrian attribute recognition[J]. ArXiv preprint arXiv, 2016, 1603(07054): 1-16. [19] Li D, Chen X, Zhang Z, et al. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios[C]. Proceedings of the IEEE International Conference on Multimedia and Expo. 2018: 1-6. [20] Zhu J, Liao S, Lei Z, et al. Pedestrian attribute classification in surveillance: Database and evaluation[C]. Proceedings of the IEEE International Conference on Computer Vision. 2013: 331-338. [21] Siebert F W, Lin H. Detecting motorcycle helmet use with deep learning[J]. Accident Analysis & Prevention, 2020, 134: 105319-105330. [22] Lin H, Deng J D, Albers D, et al. Helmet use detection of tracked motorcycles using cnn-based multi-task learning[J]. IEEE Access, 2020, 8: 162073-162084. [23] Wu J, Cai N, Chen W, et al. Automatic detection of hardhats worn by construction personnel: A deep learning approach and benchmark dataset[J]. Automation in Construction, 2019, 106: 102894-102901. [24] Wang H, Hu Z, Guo Y, et al. A real-time safety helmet wearing detection approach based on CSYOLOv3[J]. Applied Sciences, 2020, 10(19): 6732-6746. [25] Cheng R, He X, Zheng Z, et al. Multi-scale safety helmet detection based on SAS-YOLOv3-tiny[J]. Applied Sciences, 2021, 11(8): 3652-3668. [26] Li Y, Wei H, Han Z, et al. Deep learning-based safety helmet detection in engineering management based on convolutional neural networks[J]. Advances in Civil Engineering, 2020, 2020: 1-10. [27] Larxel. Safety Helmet Detection [EB/OL]. (2020-03-04) [2023-06-04]. https://www.kaggle.com/andrewmvd/hard-hat-detection. [28] Otgonbold M-E, Gochoo M, Alnajjar F, et al. SHEL5K: An extended dataset and benchmarking for safety helmet detection[J]. Sensors, 2022, 22(6): 2315-2338. [29] Krizhevsky A, Sutskever I, Hinton G E, et al. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. [30] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 580-587. [31] Uijlings J R R, Van De Sande K E A, Gevers T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104: 154-171. [32] Girshick R. Fast R-CNN[C]. Proceedings of the IEEE International Conference on Computer Vision. 2015: 1440-1448. [33] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(06): 1137-1149. [34] Lin T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 936-944. [35] Lu X, Li B, Yue Y, et al. Grid R-CNN[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7363-7372. [36] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 779-788. [37] Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv, 2018, 1804(02767): 1-6. [38] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 42(2): 318-327. [39] Tian Z, Shen C, Chen H, et al. Fcos: Fully convolutional one-stage object detection[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 9627-9636. [40] Wu Y, Chen Y, Yuan L, et al. Rethinking classification and localization for object detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10186-10195. [41] Sun P, Zhang R, Jiang Y, et al. Sparse r-cnn: End-to-end object detection with learnable proposals[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 14454-14463. [42] Zhang S, Chi C, Yao Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 9759-9768. [43] Pang J, Chen K, Shi J, et al. Libra r-cnn: Towards balanced learning for object detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 821-830. [44] Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10781-10790. [45] Du S, Shehata M, Badawy W, et al. Hard hat detection in video sequences based on face features, motion and color information[C]. Proceedings of the International Conference on Computer Research and Development. 2011: 25-29. [46] Silva R, Aires K, Santos T, et al. Automatic detection of motorcyclists without helmet[C]. Proceedings of the XXXIX Latin American Computing Conference. 2013: 1-7. [47] Rubaiyat A H M, Toma T T, Kalantari-Khandani M, et al. Automatic detection of helmet uses for construction safety[C]. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence Workshops. 2016: 135-142. [48] Li K, Zhao X, Bian J, et al. Automatic safety helmet wearing detection[C]. Proceedings of the IEEE International Conference on CYBER Technology in Automation, Control, and Intelligent Systems. 2017: 617-622. [49] Fang Q, Li H, Luo X, et al. Detecting non-hardhat-use by a deep learning method from far-field surveillance videos[J]. Automation in Construction, 2018, 100(85): 1-9. [50] 王忠玉, 刘亮. 智能视频监控下的安全帽佩戴检测系统的设计与实现[D]. 北京邮电大学, 2018. [51] Wu F, Jin G, Gao M, et al. Helmet detection based on improved YOLO V3 deep model[C]. Proceedings of the International Conference on Networking, Sensing and Control. 2019: 363-368. [52] Han G, Zhu M, Zhao X, et al. Method based on the cross-layer attention mechanism and multiscale perception for safety helmet-wearing detection[J]. Computers and Electrical Engineering, 2021, 95: 107458-107461. [53] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778. [54] Neubeck A, Van Gool L, et al. Efficient non-maximum suppression[C]. Proceedings of the International Conference on Pattern Recognition. 2006, 3: 850-855. [55] Wang J, Chen K, Yang S, et al. Region proposal by guided anchoring[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2965-2974. [56] Zhang J, Wang K, He Y, et al. Visual object tracking via cascaded rpn fusion and coordinate attention[J]. CMES-Computer Modeling in Engineering & Sciences, 2022, 132(3): 909-927. [57] Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks[C]. Proceedings of the IEEE International Conference on Computer Vision. 2017: 764-773. [58] Yu Y, Xiong Y, Huang W, et al. Deformable siamese attention networks for visual object tracking[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 6728-6737. [59] Zhu B, Wang J, Jiang Z, et al. Autoassign: Differentiable label assignment for dense object detection[J]. arXiv preprint arXiv, 2020, 2007(03496):1-11. [60] Chen C, Ling Q. Adaptive convolution for object detection[J]. IEEE Transactions on Multimedia, 2019, 21(12): 3205-3217. [61] Vu T, Jang H, Pham T X, et al. Cascade RPN: Delving into high-quality region proposal network with adaptive convolution[C]. Proceedings of the International Conference on Neural Information Processing Systems. 2019: 1432-1442. [62] 刘洪江, 王懋, 刘丽华, 等. 基于深度学习的小目标检测综述[J]. 计算机工程与科学, 2021, 43(08): 1429-1442. [63] 葛明进, 孙作雷, 孔薇. 基于anchor-free的交通场景目标检测技术[J]. 计算机工程与科学, 2020, 42(04): 707-713. [64] Cai Z, Vasconcelos N. Cascade R-CNN: High quality object detection and instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(5): 1483-1498. [65] Cao J, Cholakkal H, Anwer R M, et al. D2det: Towards high quality object detection and instance segmentation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11485-11494. [66] Ge Z, Liu S, Wang F, et al. Yolox: Exceeding yolo series in 2021[J]. arXiv preprint arXiv, 2017, 2107(08430): 1-7. [67] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]. Proceedings of the European Conference Computer Vision. 2020: 213-229. [68] Wang D, Shang K, Wu H, et al. Decoupled R-CNN: Sensitivity-Specific Detector for Higher Accurate Localization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(9): 6324-6336. [69] Liu T, Luo W, Ma L, et al. Coupled network for robust pedestrian detection with gated multi-layer feature extraction and deformable occlusion handling[J]. IEEE transactions on image processing, 2020, 30: 754-766. [70] Feng Y, Wang X, Xin Y, et al. Effective feature enhancement and model ensemble strategies in tiny object detection[C]. Proceedings of the European Conference on Computer Vision. 2020: 324-330. [71] Zoph B, Le Q V. Neural architecture search with reinforcement learning[J]. arXiv preprint arXiv, 2016, 1611(01578): 1-16. [72] Ghiasi G, Lin T Y, Le Q V, et al. Nas-FPN: Learning scalable feature pyramid architecture for object detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7036-7045. [73] Wang N, Gao Y, Chen H, et al. NAS-FCOS: Fast neural architecture search for object detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11943-11951. ﹀
中图分类号：	TP391
开放日期：	2023-06-15

附件下载