- 无标题文档
查看论文信息

题名:

 基于自重构AI芯片的目标检测应用研究与设计    

作者:

 宋佳    

学号:

 21206223049    

保密级别:

 保密(4年后开放)    

语种:

 chi    

学科代码:

 085400    

学科:

 工学 - 电子信息    

学生类型:

 硕士    

学位:

 工程硕士    

学位年度:

 2024    

学校:

 西安科技大学    

院系:

 电气与控制工程学院    

专业:

 控制工程    

研究方向:

 集成电路设计    

导师姓名:

 蒋林    

导师单位:

 西安科技大学    

提交日期:

 2024-06-24    

答辩日期:

 2024-06-05    

外文题名:

 Research and design of target detection Applied based on self-reconfigurable AI chip    

关键词:

 数据生成 ; 轻量化神经网络 ; 数据复用 ; YOLOv5 ; 自重构结构    

外文关键词:

 Data generation ; Lightweight neural networks ; Data reuse ; YOLOv5 ; Self-reconfigurable structures    

摘要:

近年来,无人机航拍图像的目标检测技术被广泛应用于救援救灾、无人机巡检等多个领域,成为目标检测技术新的应用场景。随着模型检测性能的提升,模型的计算量和参数量不断增大,对传统处理器的计算能力提出更高的要求。自重构芯片依靠其功耗效率、面积效率及灵活性优势,被认为是神经网络算法理想的硬件实现平台。本课题基于自重构AI芯片,面向对地观测场景的目标检测应用从以下几个方面进行研究。

首先,针对无人机航拍图像数据量不足的问题,研究面向对地观测场景的数据生成方法。在生成器中引入自注意力机制,使其能够关注特定目标特征,并且联系图像上下文信息,生成更高质量的图像。使用深度可分离卷积代替鉴别器中的标准卷积,提高网络的训练效率。实验结果证明,改进的模型比原始网络的FID值平均降低了39.37,有效提高了生成图像的质量。将生成的数据集应用于航拍图像目标检测任务,训练的YOLOv5和YOLOv8网络的平均精度分别提高了50.2%和42.6%。

其次,为降低目标检测网络在自重构AI芯片上部署时对存储容量的需求,分析YOLOv5目标检测网络的计算量及参数量,设计针对YOLOv5网络的模型压缩方案。首先对网络模型进行稀疏训练,裁剪重要程度不高的通道。然后,设计以知识蒸馏为基础的模型微调方案,对剪枝处理后的模型进行微调。最后,验证所提出的YOLOv5网络模型压缩方案。实验结果证明,网络经轻量化处理后,在模型精度仅下降3.6%的情况下,模型的参数量和计算量分别降低到原始模型的61.5%和67.1%。

然后,针对卷积运算过程中大量数据重复访问的现象,提出自重构结构下的YOLOv5网络数据复用方案。结合自重构阵列结构的特点,设计适用于自重构阵列结构的数据流,通过改变数据的映射方式,实现卷积计算过程中权重数据的高效复用。从循环分块、循环交换角度出发,优化数据复用模式,减少卷积运算的数据访存次数,提高卷积运算效率。使用轻量化后的YOLOv5网络基于上述数据复用模式进行仿真验证和性能分析。实验结果证明,利用上述数据复用策略在自重构结构上实现YOLOv5网络,访存次数最高可降低96.11%。

最后,开发面向对地观测场景的自重构AI芯片原型系统,并搭建FPGA测试平台,对YOLOv5网络进行测试与性能评估。首先,对YOLOv5网络模型进行16位定点量化处理,并在DDR中进行模型参数重排序。然后,分析YOLOv5网络,提出基于自重构AI芯片的并行实现方案。最后,基于Virtex UltraScale 440开发板实现自重构AI芯片原型系统,并对其进行功能测试与性能分析。实验结果证明,与GPU系统相比,自重构AI芯片原型系统可以将网络推理速度提升53.68%。

外文摘要:

In recent years, target detection technology for UAV aerial images has been widely used in many fields such as rescue and disaster relief, UAV inspection, etc., which has become a new application scenario for target detection technology. With the improvement of model detection performance, the computational volume and parameter count of the model are increasing, which puts higher requirements on the computational capability of traditional processors. Relying on its power efficiency, area efficiency and flexibility advantages, self-reconfigurable chips are considered to be an ideal hardware implementation platform for neural network algorithms. Based on the self-reconfigurable AI chip, this project aims at the target detection application for earth observation scenarios from the following aspects.

Firstly, to address the problem of insufficient data volume of UAV aerial images, the data generation method for ground observation scenes is studied. A self-attention mechanism is introduced into the generator so that it can focus on specific target features and link the image context information to generate higher quality images. Deeply separable convolution is used instead of the standard convolution in the discriminator to improve the training efficiency of the network. Experimental results demonstrate that the improved model reduces the FID value by an average of 39.37 compared to the original network, effectively improving the quality of the generated images. Applying the generated dataset to the task of target detection in aerial images, the average accuracy of the trained YOLOv5 and YOLOv8 networks was improved by 50.2% and 42.6%, respectively.

Secondly, in order to reduce the demand for storage capacity when the target detection network is deployed on a self-reconfigurable AI chip, we analyse the computational and parametric quantities of the YOLOv5 target detection network and design a model compression scheme for the YOLOv5 network. Firstly, the network model is sparsely trained and the channels with low importance are cropped. Then, a model fine-tuning scheme based on knowledge distillation is designed to fine-tune the pruned model. Finally, the proposed YOLOv5 network model compression scheme is validated. The experimental results demonstrate that after the network is lightened, the number of parameters and computation of the model are reduced to 61.5% and 67.1% of the original model with only 3.6% decrease in model accuracy.

Then, for the phenomenon of repeated access to a large amount of data during convolutional operations, a data reuse scheme for YOLOv5 network under self-reconfigurable structure is proposed. Combined with the characteristics of the self-reconfigurable array structure, the data flow applicable to the self-reconfigurable array structure is designed to achieve efficient reuse of weight data during convolutional computation by changing the mapping of data. From the perspective of cyclic chunking and cyclic exchange, the data reuse mode is optimised to reduce the number of data accesses for convolutional computation and improve the efficiency of convolutional computation. Simulation verification and performance analysis are carried out based on the above data reuse pattern using the lightweighted YOLOv5 network. The experimental results demonstrate that the number of accesses can be reduced by up to 96.11% by using the above data reuse strategy to implement the YOLOv5 network on a self-reconfigurable structure.

Finally, a self-reconfigurable AI chip prototype system for Earth observation scenarios is developed, and an FPGA test platform is built to test and evaluate the performance of the YOLOv5 network. Firstly, 16-bit fixed-point quantisation is performed on the YOLOv5 network model, and model parameter reordering is performed in DDR. Then, the YOLOv5 network is analysed and a parallel implementation based on a self-reconfigurable AI chip is proposed. Finally, a prototype system of self-reconfigurable AI chip is implemented based on Virtex UltraScale 440 development board, and functional tests and performance analysis are conducted. The experimental results demonstrate that the self-reconfiguring AI chip prototype system can improve the network inference speed by 53.68% compared with the GPU system.

参考文献:

[1] Dubrovinskaya E, Tuhtan J A. This fish does not exist: fish species image augmentation using stable diffusion[C]//OCEANS 2023-Limerick, Piscataway, NJ: IEEE, 2023: 1-6.

[2] Oshiba J, Iwata M, Kise K. Face image generation of anime characters using an advanced first order motion model with facial landmarks[J]. IEICE TRANSACTIONS on Information and Systems, 2023, 106(1): 22-30.

[3] Bilakeri S, Kotegar K. Strong baseline with auto-encoder for scale-invariant person re-identification[C]//2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER). Piscataway, NJ: IEEE, 2022: 1-6.

[4] Fang Y, Zhang X, Cao H, et al. Insulator Image Dataset Generation based on Generative Adversarial Network[C]//2023 4th International Conference on Computer Vision, Image and Deep Learning (CVIDL). Piscataway, NJ: IEEE, 2023: 103-108.

[5] Kasi G, Abirami S, Lakshmi R D. A Deep Learning Based Cross Model Text to Image Generation using DC-GAN[C]//2023 12th International Conference on Advanced Computing (ICoAC). Piscataway, NJ: IEEE, 2023: 1-6.

[6] Hostin M A, Sivtsov V, Attarian S, et al. ConText-GAN: Controllable context image generation using GANs[C]//2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). Piscataway, NJ: IEEE, 2023: 1-5.

[7] Zheng H, Wang Q. Using Image Pre-Processing to Improve Navigation Line Extraction Based on Pix2Pix Net on Small-size Datasets[C]//2023 IEEE International Conference on Industrial Technology (ICIT). Piscataway, NJ: IEEE, 2023: 1-6.

[8] Zhao C, Li Q, He X, et al. Data Augmentation of Discrete Sequential Protocol Messages Based on Recurrent Generative Adversarial Networks[C]//2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE). Piscataway, NJ: IEEE, 2022: 393-400.

[9] Mao W, Yang P, Wang Z. Fta-gan: A computation-efficient accelerator for gans with fast transformation algorithm[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 34(6): 2978-2992.

[10] Jin S, Qi N, Zhu Q, et al. Progressive gan-based transfer network for low-light image enhancement[C]//International conference on multimedia modeling. Berlin, German:Cham, Springer International Publishing, 2022: 292-304.

[11] Hariharan B, Karthic S, Nalina E, et al. Hybrid deep convolutional generative adversarial networks (DCGANS) and style generative adversarial network (STYLEGANS) algorithms to improve image quality[C]//2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC). Piscataway, NJ: IEEE, 2022: 1182-1186.

[12] Zhu T, Chen J, Zhu R, et al. StyleGAN3: generative networks for improving the equivariance of translation and rotation[J]. arXiv preprint arXiv:2307.03898, 2023.

[13] Lin W, Tianyong A, Le F, et al. Design of a YOLO Model Accelerator Based on PYNQ Architecture[C]//2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE).Guangzhou, China: IEEE, 2022: 15-18.

[14] Zehao W, Zhicheng Z, Jiangqin X, et al. Lightweight Micro-Expression Recognition Based on Structured and Unstructured Pruning[C]//2023 8th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC).Beijing, China: IEEE, 2023: 163-166.

[15] Kuang J, Shao M, Wang R, et al. Network pruning via probing the importance of filters[J]. International Journal of Machine Learning and Cybernetics, 2022, 13(9): 2403-2414.

[16] Li H, Yue X, Wang Z, et al. Optimizing the deep neural networks by layer-wise refined pruning and the acceleration on FPGA[J]. Computational Intelligence and Neuroscience, 2022,8039281.

[17] Li H, Yue X, Wang Z, et al. A survey of Convolutional Neural Networks—From software to hardware and the applications in measurement[J]. Measurement: Sensors, 2021, 18: 100080.

[18] Sawant S S, Bauer J, Erick F X, et al. An optimal-score-based filter pruning for deep convolutional neural networks[J]. Applied Intelligence, 2022, 52(15): 17557-17579.

[19] Wang Z, Zhang Z, Yin J, et al. Lightweight Micro-Expression Recognition Based on Structured and Unstructured Pruning[C]//2023 8th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). Piscataway, NJ: IEEE, 2023: 163-166.

[20] Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient dnns[J]. Advances in neural information processing systems, 2016, 29.

[21] Neill J O, Dutta S, Assem H. Aligned weight regularizers for pruning pretrained neural networks[J]. arXiv preprint arXiv:2204.01385, 2022.

[22] Gysel P, Pimentel J, Motamedi M, et al. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks[J]. IEEE transactions on neural networks and learning systems, 2018, 29(11): 5784-5789.

[23] Tailor S A, Fernandez-Marques J, Lane N D. Degree-quant: Quantization-aware training for graph neural networks[J]. arXiv preprint arXiv:2008.05000, 2020.

[24] Garg S, Lou J, Jain A, et al. Dynamic precision analog computing for neural networks[J]. IEEE Journal of Selected Topics in Quantum Electronics, 2022, 29(2): 1-12.

[25] Banner R, Nahshan Y, Soudry D. Post training 4-bit quantization of convolutional networks for rapid-deployment[J]. Advances in Neural Information Processing Systems, 2019, 32.

[26] Finkelstein A, Almog U, Grobman M. Fighting quantization bias with bias[J]. arXiv preprint arXiv:1906.03193, 2019.

[27] Meller E, Finkelstein A, Almog U, et al. Same, same but different: Recovering neural network quantization error through weight factorization[C]//International Conference on Machine Learning. Nework, NY: PMLR, 2019: 4486-4495.

[28] Nagel M, Baalen M, Blankevoort T, et al. Data-free quantization through weight equalization and bias correction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway, NJ: IEEE, 2019: 1325-1334.

[29] Lin S, Ji R, Chen C, et al. Holistic cnn compression via low-rank decomposition with knowledge transfer[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 41(12): 2889-2905.

[30] 程旗, 李捷, 高晓利等. 基于深度稀疏低秩分解的深度神经网络轻量化方法[J]. 控制与决策, 2023, 38(03): 751-758.

[31] Sun B, Li J, Shao M, et al. LRPRNet: Lightweight deep network by low-rank pointwise residual convolution[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 34(8): 4440-4450.

[32] Dai W, Fan J, Miao Y, et al. Deep Learning Model Compression With Rank Reduction in Tensor Decomposition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023:3330542.

[33] Chen Z, Zhang L, Cao Z, et al. Distilling the knowledge from handcrafted features for human activity recognition[J]. IEEE Transactions on Industrial Informatics, 2018, 14(10): 4334-4342.

[34] Yim J, Joo D, Bae J, et al. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, Piscataway, NJ: IEEE, 2017: 4133-4141.

[35] Lee S H, Kim D H, Song B C. Self-supervised knowledge distillation using singular value decomposition[C]//Proceedings of the European conference on computer vision (ECCV). Piscataway, NJ: IEEE 2018: 335-350.

[36] Zhang C, Peng Y. Better and faster: knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification[J]. arXiv preprint arXiv:1804.10069, 2018.

[37] Soebandhi S. Multisensory Culinary Image Classification based on SqueezeNet and Support Vector Machine[J].Proceedings of the IEEE Information Technology International Seminar, 2023: 104201111.

[38] Andrew G, Menglong Z. Efficient convolutional neural networks for mobile vision applications, mobilenets[J]. arXiv preprint arXiv:1704.04861, 2017.

[39] Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Piscataway, NJ: IEEE 2018: 6848-6856.

[40] Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Piscataway, NJ: IEEE, 2020: 1580-1589.

[41] Ma N, Zhang X, Huang J, et al. Weightnet: Revisiting the design space of weight networks[C]//European Conference on Computer Vision. Cham: Springer International Publishing, Berlin: Springer, 2020: 776-792.

[42] 王佳皓, 徐树公, 陆恒杰. 基于快速下采样的轻量化网络设计方法及人脸识别应用[J]. 电子学报, 2023, 51(8): 2226-2237.

[43] Huang X, Haobo X, Xiaoming C, et al. Fast and High-Accuracy Approximate MAC Unit Design for CNN Computing[J]. IEEE Embedded Systems Letters. Piscataway, NJ: IEEE, 2022, 14(3):155-158.

[44] Haoran L, Lei G, Chao W, et al. A flexible dataflow CNN accelerator on FPGA[C]//2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops, India: Bangalore, 2023:302-304.

[45] 张显石, 宋健, 宋泗锦等. 生物视觉启发的低照度视频自适应增强设计与FPGA加速实现[J]. 电子与信息学报, 2023, 45(8): 2739-2748.

[46] Li H, Gong L, Wang C, et al. A flexible dataflow CNN accelerator on FPGA[C]//2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW). IEEE, 2023: 302-304.

[47] Wang C, Wang Z, Li S, et al. EWS: An Energy-Efficient CNN Accelerator With Enhanced Weight Stationary Dataflow[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2024: 3359511.

[48] Zhao Z, Cao R, Un K F, et al. An fpga-based transformer accelerator using output block stationary dataflow for object recognition applications[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2022, 70(1): 281-285.

[49] Kim D, Jeong S, Kim J Y. Agamotto: A performance optimization framework for CNN accelerator with row stationary dataflow[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023:3258411.

[50] Kwon Y, Rhu M. A case for memory-centric HPC system architecture for training deep neural networks[J]. IEEE computer architecture letters, 2018, 17(2): 134-138.

[51] Yue, J L, Tian S C, Data and Hardware Efficient Design for Convolutional Neural Network[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2018, 65(5): 1642-1651.

[52] Yang J, Zheng H, Louri A. Venus: A versatile deep neural network accelerator architecture design for multiple applications[C]. 2023 60th ACM/IEEE Design Automation Conference (DAC). USA:San Francisco, IEEE, 2023: 1-6.

[53] Hyoukjun K, Micheal P, Angshuman P, et al. Flexion: A Quantitative Metric for Flexibility in DNN Accelerators[J]. IEEE Computer Architecture Letters, 2021,20(1):1-4.

[54] Jiang Z, Li J, Liu F, et al. A systematic study on benchmarking AI inference accelerators[J].CCF Transactions on High Performance Computing, 2022,4(2): 87-103.

[55] Tian R, Ji R, Bai C, et al. A Vision-Based Ground Moving Target Tracking System for Quadrotor UAVs[C]//2023 IEEE International Conference on Unmanned Systems (ICUS). Piscataway, NJ: IEEE, 2023: 1750-1754.

[56] Ji S, Tang H, Ming Y, et al. Design of high automatic target recognition unmanned reconnaissance system based on YOLOv5[C]//2022 IEEE 4th International Conference on Civil Aviation Safety and Information Technology (ICCASIT). Piscataway, NJ: IEEE, 2022: 1446-1449.

[57] Sanchez-Fernandez A J, Romero L F, Bandera G, et al. VPP: visibility-based path planning heuristic for monitoring large regions of complex terrain using a UAV onboard camera[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 15: 944-955.

[58] Deng J, Shi Z, Zhuo C. Energy-efficient real-time UAV object detection on embedded platforms[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 39(10): 3123-3127.

[59] Li M, Zhou G, Chen A, et al. FWDGAN-based data augmentation for tomato leaf disease identification[J]. Computers and Electronics in Agriculture, 2022, 194: 106779.

[60] Yan R, Yi J, He J, et al. FPGA-based Convolutional Neural Network Design and Implementation[C]//2023 3rd Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS). Piscataway, NJ: IEEE, 2023: 456-460.

[61] Liu J, Yang X, Meng S, et al. Design of Ship Target Detection System Based on FPGA[C]//2023 IEEE 6th International Conference on Electronic Information and Communication Technology (ICEICT). Piscataway, NJ: IEEE, 2023: 28-32.

[62] Ma X, Tang J, Bai Y. Locality-sensing Fast Neural Network (LFNN): An Efficient Neural Network Acceleration Framework via Locality Sensing for Real-time Videos Queries[C]//2023 24th International Symposium on Quality Electronic Design (ISQED). Piscataway, NJ: IEEE, 2023: 1-8.

中图分类号:

 TN492    

开放日期:

 2028-06-25    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式