查看论文信息

免费浏览

查看论文信息

论文中文题名：	复杂场景下车牌图像定位和超分增强研究与实现
姓名：	刘震
学号：	19308207010
保密级别：	公开
论文语种：	chi
学科代码：	085211
学科名称：	工学 - 工程 - 计算机技术
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	计算机技术
研究方向：	图像处理
第一导师姓名：	马天
第一导师单位：	西安科技大学
论文提交日期：	2023-01-09
论文答辩日期：	2022-12-06
论文外文题名：	Re：search and Implementation of Location and Super-resolution Enhancement of License Plate Image in Complex Scene
论文中文关键词：	目标检测 ; 复杂场景 ; 车牌定位 ; 图像超分辨率 ; 注意力机制
论文外文关键词：	Object Detection ; Complex Scenes ; License Plate Location ; Image Super-resolution ; Attention Mechanism
论文中文摘要：	︿随着大数据时代的到来，目标检测和图像超分辨率技术得到了快速发展。在复杂场景采集的车辆图像中准确定位车牌并获取有效的车牌信息，对于刑侦办案、智慧城市交通建设有至关重要的作用。因此，本文围绕在复杂场景下准确定位车牌以及对低分辨率的车牌图像区域超分辨率重建进行研究，主要的研究内容和创新点如下： (1) 针对车牌定位，由于受到设备局限、天气恶劣、光线条件不佳等影响，采集到的车牌图像分辨率低、车牌信息少，导致车牌位置定位不准的问题，设计一种基于金字塔分割注意力的车牌定位网络模型(License Plate Location Network Based on Pyramid Split Attention, PSA-YOLO)。首先，为减少复杂背景的干扰，引入轻量高效的金字塔分割注意力模块(Pyramid Split Attention, PSA)，捕获车牌图像中不同尺度的特征图空间信息。其次，为简单快速地融合车牌图像中多尺度特征图信息，在网络瓶颈层中采用加权双向特征金字塔网络结构。最后，通过在检测头部引入基于Transformer 的检测头，来更准确的定位小目标车牌。实验表明，PSA-YOLO 模型能够在复杂场景下更加准确的定位车牌，为后续车牌图像超分辨率重建奠定基础。 (2) 针对传统超分模型难以有效应对低分辨率车牌图像识别的问题，以及在复杂场景下对于重建车牌文本图像存在背景干扰、字符多变、保持文本语义序列信息的难点，提出一种针对文本图像超分辨率的生成对抗网络模型(Text Image Super-Resolution Based on Adversarial Network, TextGAN)。首先，通过在网络残差块中加入多尺度特征融合结构来从低分辨率图像中提取更丰富的字符细节特征，并设计一个基于Transformer 的多维度注意力模块，融合通道注意力和空间注意力，捕获跨通道信息交互使得网络更加专注于图像中的文本区域，减少无关背景对字符重建的干扰。其次，采用CARAFE 上采样方法，有效地利用图像上下文信息，并根据输入内容进行特征重组。最后，利用判别器网络来监督生成器重建更加清晰锐利的字符。实验表明，在TextZoom 数据集上TextGAN 模型对文本图像重建取得了更好的视觉效果，相较于STT 方法，TextGAN 模型在ASTER、MORAN、CRNN 三种识别模型中平均识别精度分别提高了0.6%、0.6%、1.1%。在车牌图像数据集中TextGAN 模型取得了更好的重建效果，相较于TSRN、STT网络，采用LPR-Net 识别率分别提高了4.34%、2.76%。 (3) 基于提出的车牌定位模型PSA-YOLO 和车牌文本图像超分辨TextGAN 模型，设计并实现了一套基于B/S 架构的车牌图像超分辨率软件系统。实现了对车牌图像的定位和超分增强功能，并且可对结果进行可视化展示。﹀
论文外文摘要：	︿ With the arrival of the big data era, the technology of super-resolution reconstruction and target detection has developed rapidly. Accurately detecting the license plate location in the complex scene vehicle image and obtain effective license plate information, and have a criminal investigation case, the construction of smart urban traffic has a vital role. Therefore, this thesis is related to the accurate detecting license plate under complex scenes and further super-resolution reconstruction of low-resolution license plate image area, the main content and innovation points are as follows: (1) For license plate location, due to the limitations of equipment, bad weather and poor light conditions, the collected license plate image has low-resolution and less license plate information, resulting in the problem of inaccurate license plate location. A license plate location network (YOLO Algorithm Based on Pyramid Split Attention, PSA-YOLO) based on pyramid split attention is designed to accurately locate the license plate in complex scenes. Firstly, in order to reduce the interference of complex background, a lightweight and efficient pyramid segmentation attention module (Pyramid Split Attention, PSA) is introduced to capture and utilize the spatial information of different scales of feature map in license plate image. Secondly, in order to simply and quickly fuse the multi-scale feature information of license plate image, the weighted bi-directional feature pyramid network structure is adopted in the network bottleneck layer. Finally, the Transformer based detection head is introduced into the detection head to locate the license plate target more accurately. Experiments show that the improved PSA-YOLO can accurately locate the license plate in complex scenes, and laying a foundation for super-resolution reconstruction of license plate image. (2) The traditional model is difficult to effectively deal with the problem of low-resolution license plate image recognition, and reconstructing license plate text image in complex scene, such as background interference, changeable characters and maintaining text semantic sequence information. We proposed a model for text image super-resolution based on generative adversarial network (Text Image Super-Resolution Based on Adversarial Network, TextGAN). Firstly, a multi-scale feature fusion structure is added to the residual block to extract richer character detail features from low-resolution images. The multi-dimensional attention module based on Transformer is designed, which integrates channel and spatial attention to capture cross-channel interaction. Thus, this network can focus more on the text in the image and reduce the interference of irrelevant background on character reconstruction. Then, the CARAFE upsampling method is used to make effective use of the image context information, and the features map are restructured according to the input content. Finally, a discriminator is added to supervise the generator to reconstruct more clearer and sharper characters. Experiments show that TextGAN achieves better visual effect on text image reconstruction in TextZoom dataset. Compared with the STT method, the average recognition accuracy of TextGAN in ASTER, MORAN and CRNN recognition networks is improved by 0.6%, 0.6% and 1.1% respectively. In the license plate image dataset, TextGAN can reconstruct the license plate text image well. Compared with TSRN and STT networks, the recognition rate of LPR-Net is improved by 4.34% and 2.76% respectively. (3) Finally, based on the license plate location network PSA-YOLO and license plate text image super-resolution network TextGAN proposed in this work, a software system based on B/S architecture is designed and implemented. It realizes the function of enhancing and locating the license plate image, and visually displaying the results. ﹀
参考文献：	︿ [1] 马永杰, 程时升, 马芸婷, 马义德. 卷积神经网络及其在智能交通系统中的应用综述[J]. 交通运输工程学报, 2021, 21(4): 48-71. [2] 张毅, 姚丹亚, 李力, 裴华鑫, 晏松, 葛经纬. 智能车路协同系统关键技术与应用[J]. 交通运输系统工程与信息, 2021, 21(5): 40-51. [3] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection[C]// IEEE conference on computer vision and pattern recognition. 2016:779-788. [4] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C. Ssd: Single shot multibox detector[C]// European conference on computer vision. Springer, 2016:21-37. [5] Ren S, He K, Girshick R, Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks [J]. Advances in neural information processing systems, 2015,28. [6] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// IEEE conference on computer vision and pattern recognition. 2014: 580-587. [7] Xu Z, Yang W, Meng A, Lu N, Huang H, Ying C, Huang L. Towards end-to-end license plate detection and recognition: A large dataset and baseline[C]// European conference on computer vision (ECCV). 2018:255-271. [8] Chen R-C. Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning [J]. Image Vision Computing, 2019, 87: 47-56. [9] Wang D, Tian Y, Geng W, Zhao L, Gong C. LPR-Net: Recognizing Chinese license plate in complex environments [J]. Pattern Recognition Letters, 2020, 130: 148-156. [10] 史建伟, 章韵. 基于改进YOLOv3 和BGRU 的车牌识别系统[J]. 计算机工程与应用, 2020, 41(08): 2345-2351. [11] 王燕，张继凯，尹乾. 基于Faster R-CNN 的车牌识别算法[J]. 北京师范大学学报：自然科学版, 2020, 56(5): 647-653. [12] Li H, Shen C. Reading Car License Plates Using Deep Convolutional Neural Networks and LSTMs[J/OL] Image Vision Computing: 1-17. [13] Liu D, Wu Y, He Y, Qin L, Zheng B. Multi-Object Detection of Chinese License Plate in Complex Scenes [J]. Computer Systems Science Engineering, 2021, 36(1): 145-156. [14] Gribbon K T, Bailey D G. A novel approach to real-time bilinear interpolation[C]//DELTA 2004 Second IEEE International Workshop on Electronic Design, Test and Applications. IEEE, 2004: 126-131. [15] Fritsch F N, Carlson R E. Monotone piecewise cubic interpolation [J]. SIAM Journal on Numerical Analysis, 1980, 17(2): 238-246. [16] Freeman W T, Pasztor E C, Carmichael O T. Learning low-level vision [J]. International journal of computer vision, 2000, 40(1): 25-47. [17] Chang H, Yeung D-Y, Xiong Y. Super-resolution through neighbor embedding[C]// 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004 CVPR 2004. IEEE, 2004: I-I. [18] Dong C, Loy C C, He K, Tang X. Image super-resolution using deep convolutional networks [J]. IEEE transactions on pattern analysis machine intelligence, 2015, 38(2):295-307. [19] Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A,Totz J, Wang Z. Photo-realistic single image super-resolution using a generative adversarial network[C]// IEEE conference on computer vision and pattern recognition.2017:4681-4690. [20] Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y. Residual dense network for image super-resolution[C]// IEEE conference on computer vision and pattern recognition.2018:2472-2481. [21] Wang W, Xie E, Liu X, Wang W, Liang D, Shen C, Bai X. Scene text image super-resolution in the wild[C]// European Conference on Computer Vision. Springer,2020:650-666. [22] Wang W, Xie E, Sun P, Wang W, Tian L, Shen C, Luo P. Textsr: Content-aware text super-resolution guided by recognition[J/OL] DOI: 10. 48550/arXiv. 1909. 07113. [23] Chen J, Li B, Xue X. Scene text telescope: Text-focused scene image super-resolution[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 12026-12035. [24] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]// 2005 IEEE computer society conference on computer vision and pattern recognition IEEE,2005:886-893. [25] Hearst M A, Dumais S T, Osuna E, Platt J, Scholkopf B,applications t. Support vector machines [J]. IEEE Intelligent Systems, 1998, 13(4): 18-28. [26] Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale,deformable part model[C]// 2008 IEEE conference on computer vision and pattern recognition. IEEE, 2008: 1-8. [27] He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn[C]// IEEE international conference on computer vision. 2017: 2961-2969. [28] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]// IEEE conference on computer vision and pattern recognition. 2017: 7263-7271. [29] Redmon J, Farhadi A. Yolov3: An incremental improvement[J/OL] DOI: 10.48550/arXiv.1804.02767. [30] Girshick R. Fast r-cnn[C]// IEEE international conference on computer vision. 2015:1440-1448. [31] Wei D. Image super-resolution reconstruction using the high-order derivative interpolation associated with fractional filter functions [J]. IET Signal Processing, 2016,10(9): 1052-1061. [32] Hou H, Andrews H. Cubic splines for image interpolation and digital filtering [J]. IEEE Transactions on acoustics, speech，signal processing, 1978, 26(6): 508-517. [33] Li X, Orchard M T. New edge-directed interpolation [J]. IEEE transactions on image processing, 2001, 10(10): 1521-1527. [34] Yang X, Zhang Y, Zhou D, Yang R. An improved iterative back projection algorithm based on ringing artifacts suppression [J]. Neurocomputing, 2015, 162: 171-179. [35] Schultz R R, Stevenson R L. Extraction of high-resolution frames from video sequences[J]. IEEE transactions on image processing, 1996, 5(6): 996-1011. [36] Stark H, Oskoui P. High-resolution image recovery from image-plane arrays, using convex projections [J]. JOSA A, 1989, 6(11): 1715-1726. [37] Irani M, Peleg S. Improving resolution by image registration [J]. CVGIP: Graphical models image processing, 1991, 53(3): 231-239. [38] Pu J, Zhang J, Guo P, Yuan X. Interactive super-resolution through neighbor embedding[C]// Asian Conference on Computer Vision.Springer, 2009:496-505. [39] Yang J, Wright J, Huang T S, Ma Y. Image super-resolution via sparse representation [J].IEEE transactions on image processing, 2010, 19(11): 2861-2873. [40] 李佳星, 赵勇先, 王京华. 基于深度学习的单幅图像超分辨率重建算法综述[J]. 自动化学报, 2021, 47(10): 2341-2363. [41] 黄健, 赵元元, 郭苹, 王静. 深度学习的单幅图像超分辨率重建方法综述[J]. 计算机工程与应用, 2021, 57(18): 13-23. [42] Shi W, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D, Wang Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]// IEEE conference on computer vision and pattern recognition. 2016:1874-1883. [43] Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, Qiao Y, Change Loy C. Esrgan: Enhanced super-resolution generative adversarial networks[C]// European conference on computer vision (ECCV) workshops. 2018:0-0. [44] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A,Bengio Y. Generative adversarial nets [J]. Advances in neural information processing systems, 2014, 27: 1-9. [45] Zhang H, Zu K, Lu J, Zou Y, Meng D. EPSANet: An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network [J]. arXiv preprint arXiv: 14447,2021. [46] Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]//IEEE/CVF conference on computer vision and pattern recognition. 2020:10781-10790. [47] Zhu X, Lyu S, Wang X, Zhao Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios[C]//IEEE/CVF International Conference on Computer Vision. 2021: 2778-2788. [48] Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks [J]. Advances in neural information processing systems, 2015, 28: 1-9. [49] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł,Polosukhin I. Attention is all you need [J]. Advances in neural information processing systems, 2017, 30: 1-11. [50] Wang J, Chen K, Xu R, Liu Z, Loy C C, Lin D. Carafe: Content-aware reassembly of features[C]// IEEE/CVF International Conference on Computer Vision. 2019:3007-3016. [51] Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition [J]. IEEE transactions on pattern analysis machine intelligence, 2016, 39(11): 2298-2304. [52] He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]// IEEE international conference on computer vision. 2015:1026-1034. [53] Liu S, Huang D. Receptive field block net for accurate and fast object detection[C]//European conference on computer vision (ECCV). 2018:385-400. [54] Wang Q, Wu B, Zhu P, Li P, Hu Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020: 1-12. [55] Zhang Q-L, Yang Y-B. Sa-net: Shuffle attention for deep convolutional neural networks[C]// ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).IEEE, 2021: 2235-2239. [56] Cai J, Zeng H, Yong H, Cao Z, Zhang L. Toward real-world single image super-resolution: A new benchmark and a new model[C]// IEEE/CVF International Conference on Computer Vision. 2019: 3086-3095. [57] Zhang X, Chen Q, Ng R, Koltun V. Zoom to learn, learn to zoom[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3762-3770. [58] Kingma D, Ba J. Adam: A Method for Stochastic Optimization [J]. Computer Science,2014: 1-15. [59] Luo C, Jin L, Sun Z. Moran: A multi-object rectified attention network for scene text recognition [J]. Pattern Recognition, 2019, 90: 109-118. [60] Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X. Aster: An attentional scene text recognizer with flexible rectification [J]. IEEE transactions on pattern analysis machine intelligence,2018, 41(9): 2035-2048. [61] Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks[C]// IEEE conference on computer vision and pattern recognition. 2016: 1646-1654. [62] Lim B, Son S, Kim H, Nah S, Mu Lee K. Enhanced deep residual networks for single image super-resolution[C]// IEEE conference on computer vision and pattern recognition workshops. 2017: 136-144. [63] Lai W-S, Huang J-B, Ahuja N, Yang M-H. Deep laplacian pyramid networks for fast and accurate super-resolution[C]// IEEE conference on computer vision and pattern recognition. 2017: 624-632. [64] Fang C, Zhu Y, Liao L, Ling X. TSRGAN: Real-world text image super-resolution based on adversarial learning and triplet attention [J]. Neurocomputing, 2021, 455: 88-96. ﹀
中图分类号：	TP391.4
开放日期：	2023-01-09

附件下载