查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于阵列分区的动态自重构处理器结构研究与实现
姓名：	刘尧
学号：	20206035034
保密级别：	保密（1年后开放）
论文语种：	chi
学科代码：	080903
学科名称：	工学 - 电子科学与技术（可授工学、理学学位） - 微电子学与固体电子学
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	电气与控制工程学院
专业：	电子科学与技术
研究方向：	集成电路设计
第一导师姓名：	蒋林
第一导师单位：	西安科技大学
论文提交日期：	2023-06-19
论文答辩日期：	2023-06-01
论文外文题名：	Research and implementation of dynamic Self-reconfigurable processor based on array partition
论文中文关键词：	可重构阵列结构 ; 区域性自重构 ; 神经网络加速器 ; Transformer网络 ; 路径规划
论文外文关键词：	Reconfigurable array structure ; Regional self-reconfiguration ; Neural network accelerator ; Transformer network ; Route planning
论文中文摘要：	︿随着神经网络算法在更多领域的广泛应用，复杂场景下的计算任务向传统的计算架构提出了更高的性能要求。可重构结构依靠其功耗效率、面积效率及灵活性优势，被认为是神经网络算法理想的硬件实现平台。然而，在可重构阵列结构上实现此类人工智能算法，存在计算效率低、重构时间长、灵活性差等问题。论文在可编程动态自重构三维阵列芯片体系结构基础上，提出了一种基于阵列分区的处理器结构及相应的动态自重构方法。该结构在提升可重构阵列面向神经网络算法运算效率的同时，能够有效增强动态自重构阵列结构灵活性，实现阵列结构的区域性动态自重构。首先，针对人工智能算法中普遍存在的大规模矩阵运算、浮点数运算、乘累加运算设计了加速器结构。包括大规模矩阵运算乘法器、基于Karatsuba算法的浮点乘法器以及分布式Vedic乘累加器。在避免额外硬件开销的同时，有效加速了可重构阵列处理器面向此类计算任务的处理速度。使阵列结构具备了支持神经网络算法高效软件重构的能力。FPGA测试结果表明，在硬件开销相近的情况下，阵列结构处理矩阵运算的平均速率相较于原有结构提高了20.15%。其次，提出了支持阵列区域重构的处理元结构，设计了读写响应及分区激活等模块；提出了适配神经网络应用的五级流水线结构，并设计了相应的指令集结构；新增了神经网络算法软件重构所需的矩阵运算及特殊运算指令。使阵列处理元具备了分区配置的能力，减少了指令执行时间。仿真结果表明，针对大规模矩阵运算指令测试，支持阵列区域重构的处理元结构指令处理效率较原有结构提升了9.45%。然后，提出了基于阵列动态分区的自重构方法，设计了阵列动态分区所需的关键模块；提出了基于区域执行状态反馈的状态收集方法以及动态分区层级配置方法，设计了基于阵列分区的动态自重构处理器整体结构。FPGA测试结果表明，16分区模式下减少了24.21%的状态信息传输，节省了重构过程中8.21%的执行周期消耗。最后，开发了面向神经网络算法的阵列分区动态自重构原型系统，研究了适用于可重构阵列并行计算架构的Transformer网络压缩方法；设计了Transformer网络编码层和解码层的分层映射方案及分区配置模式，搭建了VRP路径规划平台对实验结果进行测试。FPGA测试结果表明，ED-Transformer能够基于阵列分区动态自重构结构实现软件重构，且路径规划的最大平均误差仅为1.43%。﹀
论文外文摘要：	︿ With the wide application of neural network algorithms in more fields, the computational tasks in complex scenarios put higher performance requirements on traditional computing architectures. Reconfigurable structures are considered as ideal hardware implementation platforms for neural network algorithms, relying on their power efficiency, area efficiency and flexibility advantages. However, the implementation of such artificial intelligence algorithms on reconfigurable array structures suffers from low computational efficiency, long reconfiguration time, and poor flexibility. Based on the programmable dynamic self-reconfiguring 3D array chip architecture, the paper proposes a processor architecture based on array partitioning and the corresponding dynamic self-reconfiguration method. The structure can enhance the computational efficiency of reconfigurable array-oriented neural network algorithm while effectively enhancing the flexibility of dynamic self-reconfigurable array structure and realizing regional dynamic self-reconfiguration of array structure. Firstly, the gas pedal structures are designed for large-scale matrix operations, floating-point operations, and multiply-accumulate operations, which are common in artificial intelligence algorithms. These include a large-scale matrix multiplier, a floating-point multiplier based on the Karatsuba algorithm, and a distributed Vedic multiply-accumulator. The reconfigurable array processor can effectively accelerate the processing speed for such computational tasks while avoiding additional hardware overhead. FPGA test results show that the average rate of matrix processing in the array structure is 20.15% higher than that of the original structure with similar hardware overhead. Secondly, a processing element structure supporting array area reconfiguration is proposed, and modules such as read/write response and partition activation are designed; a five-level waterline structure adapted to neural network applications is proposed, and the corresponding instruction set structure is designed; matrix operations and special operation instructions required for software reconfiguration of neural network algorithms are added. The array processing elements are equipped with the ability of partitioned configuration, and the instruction execution time is reduced. Simulation results show that the instruction processing efficiency of the processing element structure supporting array region reconfiguration is improved by 9.45% compared with the original structure for large-scale matrix operation instruction test. Thirdly, the self-reconfiguration method based on dynamic partitioning of the array is proposed, and the key modules required for dynamic partitioning of the array are designed; the state collection method based on regional execution state feedback and the dynamic partitioning hierarchy configuration method are proposed, and the overall structure of the dynamic self-reconfiguration processor based on array partitioning is designed. FPGA test results show that the 16-partitioning mode reduces the state information transmission by 24.21% and saves 8.21% of the execution cycle consumption during reconfiguration. Finally, the prototype system of dynamic self-reconfiguration of array partitioning for neural network algorithm is developed, and the Transformer network compression method for reconfigurable array parallel computing architecture is studied; the layered mapping scheme and partitioning configuration mode of Transformer network coding and decoding layers are designed, and the VRP path planning platform is built to test the experimental results. The results show that ED-Transformer can achieve software reconstruction based on the dynamic self-reconfiguration structure of array partitioning, and the maximum average error of path planning is only 1.43%. ﹀
参考文献：	︿ [1] S. Wei, X. Lin, F. Tu, Y. Wang, L. Liu and S. Yin. Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 70(3): 1228-1241. [2] Y. Kan, M. Wu, R. Zhang, Y. Nakashima. A Multi-Grained Reconfigurable Accelerator for Approximate Computing[C]// Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2020: 90-95. [3] C. Tan. Democratizing Coarse-Grained Reconfigurable Arrays[C]// 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 2021: 149-155. [4] J. Encinas, A. Rodríguez, A. Otero, E. De La Torre. Run-Time Monitoring and ML-Based Modeling in Reconfigurable Multi-Accelerator Systems[C]// 2021 XXXVI Conference on Design of Circuits and Integrated Systems (DCIS). IEEE, 2021: 1-7. [5] 魏少军,李兆石,朱建峰,刘雷波.可重构计算:软件可定义的计算引擎[J].中国科学:信息科学,2020,50(09):1407-1426. [6] M. Wijtvliet, A. Kumar, H. Corporaal. Blocks: Challenging SIMDs and VLIWs with a Reconfigurable Architecture[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(9): 2915-2928. [7] Y. Gong, Z. Teng, Q. Liu. An Energy-Efficient Reconfigurable AI-Based Object Detection and Tracking Processor Supporting Online Object Learning[J]. Solid State Circuits Letters, 2022, 5(1): 78-81. [8] J. Anderson, Y. Alkabani, T. El-Ghazawi. ReCPE: A PE for Reconfigurable Lightweight Cryptography[C]// 34th International System on Chip Conference (SOCC). IEEE, 2021: 176-181. [9] L. Chen, J. Zhu, Y. Deng. An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures[J]. IEEE Transactions on Parallel and Distributed Systems, 2022, 32(12): 3066-3080. [10] G. Korol, M. G. Jordan, M. Brandalero, M. Hübner, M. Beck Rutzig, A. C. Schneider Beck. MCEA: A Resource-Aware Multicore CGRA Architecture for the Edge[C]// 30th International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2020: 33-39. [11] A. Penskoi. Synthesis Method for CGRA Processors based on Imitation Model[C]// 10th Mediterranean Conference on Embedded Computing (MECO). 2021: 1-4. [12] O. Akbari, M. Kamal, A. Afzali-Kusha, M. Pedram, M. Shafique. X-CGRA: An Energy-Efficient Approximate Coarse-Grained Reconfigurable Architecture[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(10): 2558-2571. [13] C. Tan. OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays[C]// 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2021: 149-155. [14] C. -B. Wu, C. -H. Chen, C. -P. Kuan. Reconfigurable Deep Learning Accelerator Hardware Architecture Design for Sparse CNN[C]// 2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Penghu, Taiwan, 2021: 1-4. [15] Y. -G. Chen, C. -W. Hsu, H. -Y. Chiang, T. -H. Hsieh, J. -Y. Jou. A Hierarchical and Reconfigurable Process Element Design for Quantized Neural Networks[C]// 2021 IEEE 34th International System-on-Chip Conference (SOCC), Las Vegas, NV, USA, 2021: 278-283. [16] Y. Liu, S. Han, H. Cai, Y. Wang. The Research of Reconfigurable Array Processor for Massive MIMO Detection Algorithm[C]// 2022 14th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Changsha, China, 2022: 140-147. [17] N. Chen, Z. Wang, R. He, J. Jiang, F. Cheng. Efficient scheduling mapping algorithm for row parallel coarse-grained reconfigurable architecture[J]. Tsinghua Science and Technology,2021, 26(5): 724-735. [18] O. Ragheb, T. Yu, D. Ma, J. Anderson. Modeling and Exploration of Elastic CGRAs[C]// 32nd Field-Programmable Logic and Applications (FPL), Belfast, United Kingdom, 2022: 404-410. [19] S. Sekhar, Y. Gopinathan, B. Yamuna, K. Balasubramanian. Realization of Turbo Decoder on Coarse Grained Reconfigurable Architectures[C]// 19th India Council International Conference (INDICON), Kochi, India, 2022: 1-6. [20] Y. Gong. RAODAT: An Energy-Efficient Reconfigurable AI-based Object Detection, Tracking Processor with Online Learning[C]// Asian Solid-State Circuits Conference (A-SSCC), Busan, Korea, 2021: 1-3. [21] Y. Gong. An Energy-Efficient Reconfigurable AI-Based Object Detection, Tracking Processor Supporting Online Object Learning[J]. Solid-State Circuits Letters, 2022, 5(1): 78-81. [22] L. Dai, Y. Wang, C. Liu, F. Li, H. Li, X. Li. Reexamining CGRA Memory Sub-system for Higher Memory Utilization and Performance[C]// 40th International Conference on Computer Design (ICCD), Olympic Valley, CA, USA, 2022: 42-49. [23] L. Jiang, Y. Liu, R. Shan, Y. Feng, Y. Zhang, X. Xie. RDMM: Runtime dynamic migration mechanism of distributed cache for reconfigurable array processor [J]. VLSI J. 2020, 72 (5): 82-91. [24] J. Deng, L. Jiang, Y. Zhu. HRM: H-tree based reconfiguration mechanism in reconfigurable homogeneous PE array[J]. Journal of Semiconductors, 2020, 41(2): 402-408. [25] L. Jiang, X. Wu, Y. Zhu. 3D-HEVC Virtual View Synthesis Based on a Reconfigurable Architecture[J]. IEICE Transactions on Communications, 2020, 103(5): 618-626. [26] 李静晨,史豪斌,黄国胜.基于自注意力机制和策略映射重组的多智能体强化学习算法[J].计算机学报,2022,45(09):1842-1858. [27] Y. Wang.An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention[J]. IEEE Journal of Solid-State Circuits, 2023, 58(1): 227-242. [28] 郭丹,姚沈涛,王辉等.嵌入局部聚类描述符的视频问答Transformer模型[J].计算机学报,2023,46(04):671-689. [29] Y. Gao, W. Liu, F. Lom.Design and Implementation of an Approximate Softmax Layer for Deep Neural Networks[C]// International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 2020: 1-5. [30] Y. Xiong. Accelerating Deep Neural Network Computation on a Low Power Reconfigurable Architecture[C]// International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 2020: 1-5. [31] B. Kumar, P. Sobti, O. S. Vibhandik, A. Kumar, V. Sharma. An Architecture Comprising of Convolutional Neural Network (CNN)-Support Vector Machine (SVM) and CNN-SoftMax for Cricket Shots Classification[C]// 19th India Council International Conference (INDICON), Kochi, India, 2022: 1-6. [32] Z. Xiao, P. Xu, X. Wang, L. Chen, F. An. A Multi-Class Objects Detection Coprocessor with Dual Feature Space and Weighted Softmax[J]. IEEE Transactions on Circuits and Systems, 2020, 67(9): 1629-1633. [33] C. Cai, T. Zhang, Z. Weng, C. Feng, Y. Wang.A Transformer Architecture with Adaptive Attention for Fine-Grained Visual Classification[C]// 17th International Conference on Computer and Communications (ICCC), Chengdu, China, 2021: 863-867. [34] C. Fang, A. Zhou, Z. Wang.An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers[J]. IEEE Transactions on Very Large Scale Integration, 2022, 30(11): 1573-1586. [35] S. Lu, M. Wang, S. Liang, J. Lin, Z. Wang. Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer[C]// 2020 IEEE 33rd International System-on-Chip Conference (SOCC), Las Vegas, NV, USA, 2020: 84-89. [36] T. Ham. ELSA: Hardware-Software Co-design for Efficient, Lightweight Self-Attention Mechanism in Neural Networks[C]// 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, 2021: 692-705. [37] Y. Bai, X. Yao, Q. Sun, B. Yu. AutoGTCO: Graph and Tensor Co-Optimize for Image Recognition with Transformers on GPU[C]// 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, 2021: 112-119. [38] D. Ma, X. Qin, X. Jiao. Reconfigurable Approximate Computation Bypass for Vision Transformers[C]// 23rd International Symposium on Quality Electronic Design (ISQED), Santa, CA, USA, 2022: 1-5. [39] F. Tu. TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes[J]. IEEE Journal of Solid-State Circuits, 2022, 43(14): 154-159. [40] X. Chen, H. Zhang, F. Zhao, Y. Cai, H. Wang, Q. Ye. Vehicle Trajectory Prediction Based on Intention-Aware Non-Autoregressive Transformer with Multi-Attention Learning for Internet of Vehicles[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71(14): 1-12. [41] J. Xu, L. Xiao, D. Zhao, Y. Nie. Trajectory Prediction for Autonomous Driving with Topometric Map[C]// International Conference on Robotics and Automation (ICRA), Philadelphia, USA, 2022: 8403-8408. [42] Q. Jiao, W. Hu, F. Liu, Y. Dong. RISC-V Based Extended Instruction Set for Transformer[C]// International Conference on Systems (SMC), Melbourne, Australia, 2021: 1565-1570. [43] H. Li. A Single Precision Floating Point Multiplier for Machine Learning Hardware Acceleration[C]// 2021 IEEE Conference on Telecommunications (TOCS), Shenyang, China, 2021: 674-677. [44] X. Wang, N. Wu, F. Zhou, F. Ge. Efficient Configurable Digit-Serial Multiplier Based on Improved Karatsuba Algorithm over GF(2m) [C]// 22nd International Conference on Communication Technology (ICCT), Nanjing, China, 2022: 1531-1535. [45] M. B. Murugesh, S. Nagaraj, J. Jayasree, G. V. K. Reddy. Modified High Speed 32-bit Vedic Multiplier Design and Implementation[C]// 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2020: 929-932. [46] K. Chen, Y. Liao. A Reconfigurable Deep Neural Network on Chip Design with Flexible Convolutional Operations[C]// 15th IEEE/ACM International Workshop on Network on Chip Architectures (NoCArc), Chicago, IL, USA, 2022: 1-5. [47] H. Wang, T. Chang. Row-wise Accelerator for Vision Transformer[C]// 14th International Conference on Artificial Intelligence Circuits and Systems (AICAS), Incheon, Korea, 2022: 399-402. [48] A. Bhadra, S. Samui. Design and Analysis of High-Throughput Two-Cycle Multiply-Accumulate Architectures for Fixed-Point[C]// IEEE Calcutta Conference (CALCON), Kolkata, India, 2022: 267-272. [49] D. Passaretti, T. Pionteck. Configurable Pipelined Datapath for Data Acquisition in Interventional Computed Tomography[C]// 29th Annual International Symposium (FCCM), 2021: 257-257. [50] D. Zhu, S. Lu, M. Wang, J. Lin, Z. Wang. Efficient Precision-Adjustable Architecture for Softmax Function in Deep Learning[J]. IEEE Transactions on Circuits and Systems, 2020, 67(12): 3382-3386. [51] H. Wang, T. Chang. Row-wise Accelerator for Vision Transformer[C]// 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), Incheon, Korea, 2022: 399-402. [52] S. Lu, M. Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer[C]// 33rd International System-on-Chip Conference (SOCC), Las Vegas, USA, 2020: 84-89. [53] J. Dickerson, I. Galanis, Z. Tasoulas. Adaptive Approximate Computing on Hardware Accelerators Targeting Internet-of-Things[C]// 16th World Forum on Internet of Things. IEEE, 2020: 1-6. [54] 蒋林,贺飞龙,山蕊,王帅,吴皓月,武鑫.可重构视频阵列处理器测试平台设计与实现[J].系统仿真学报,2020,32(05):792-800. [55] M. Lu, P. Guo, H. Shi, C. Cao, Z. Ma. Transformer-based Image Compression[C]// 2022 Data Compression Conference (DCC), Snowbird, USA, 2022: 469-469. [56] K. Kang, C. Zhang, C. Guo. Ship trajectory prediction based on transformer model[C]// International Conference on Data-driven Optimization of Complex Systems (DOCS), Chengdu, China, 2022: 1-5. [57] J. Xu, L. Xiao, D. Zhao, Y. Nie, B. Dai. Trajectory Prediction for Autonomous Driving with Topometric Map[C]// International Conference on Robotics and Automation (ICRA), Philadelphia, USA, 2022:1-9. ﹀
中图分类号：	TN492
开放日期：	2024-06-19

附件下载