论文中文题名: | 基于阵列分区的动态自重构处理器结构研究与实现 |
姓名: | |
学号: | 20206035034 |
保密级别: | 保密(1年后开放) |
论文语种: | chi |
学科代码: | 080903 |
学科名称: | 工学 - 电子科学与技术(可授工学、理学学位) - 微电子学与固体电子学 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 集成电路设计 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2023-06-19 |
论文答辩日期: | 2023-06-01 |
论文外文题名: | Research and implementation of dynamic Self-reconfigurable processor based on array partition |
论文中文关键词: | 可重构阵列结构 ; 区域性自重构 ; 神经网络加速器 ; Transformer网络 ; 路径规划 |
论文外文关键词: | Reconfigurable array structure ; Regional self-reconfiguration ; Neural network accelerator ; Transformer network ; Route planning |
论文中文摘要: |
随着神经网络算法在更多领域的广泛应用,复杂场景下的计算任务向传统的计算架构提出了更高的性能要求。可重构结构依靠其功耗效率、面积效率及灵活性优势,被认为是神经网络算法理想的硬件实现平台。然而,在可重构阵列结构上实现此类人工智能算法,存在计算效率低、重构时间长、灵活性差等问题。论文在可编程动态自重构三维阵列芯片体系结构基础上,提出了一种基于阵列分区的处理器结构及相应的动态自重构方法。该结构在提升可重构阵列面向神经网络算法运算效率的同时,能够有效增强动态自重构阵列结构灵活性,实现阵列结构的区域性动态自重构。 首先,针对人工智能算法中普遍存在的大规模矩阵运算、浮点数运算、乘累加运算设计了加速器结构。包括大规模矩阵运算乘法器、基于Karatsuba算法的浮点乘法器以及分布式Vedic乘累加器。在避免额外硬件开销的同时,有效加速了可重构阵列处理器面向此类计算任务的处理速度。使阵列结构具备了支持神经网络算法高效软件重构的能力。FPGA测试结果表明,在硬件开销相近的情况下,阵列结构处理矩阵运算的平均速率相较于原有结构提高了20.15%。 其次,提出了支持阵列区域重构的处理元结构,设计了读写响应及分区激活等模块;提出了适配神经网络应用的五级流水线结构,并设计了相应的指令集结构;新增了神经网络算法软件重构所需的矩阵运算及特殊运算指令。使阵列处理元具备了分区配置的能力,减少了指令执行时间。仿真结果表明,针对大规模矩阵运算指令测试,支持阵列区域重构的处理元结构指令处理效率较原有结构提升了9.45%。 然后,提出了基于阵列动态分区的自重构方法,设计了阵列动态分区所需的关键模块;提出了基于区域执行状态反馈的状态收集方法以及动态分区层级配置方法,设计了基于阵列分区的动态自重构处理器整体结构。FPGA测试结果表明,16分区模式下减少了24.21%的状态信息传输,节省了重构过程中8.21%的执行周期消耗。 最后,开发了面向神经网络算法的阵列分区动态自重构原型系统,研究了适用于可重构阵列并行计算架构的Transformer网络压缩方法;设计了Transformer网络编码层和解码层的分层映射方案及分区配置模式,搭建了VRP路径规划平台对实验结果进行测试。FPGA测试结果表明,ED-Transformer能够基于阵列分区动态自重构结构实现软件重构,且路径规划的最大平均误差仅为1.43%。 |
论文外文摘要: |
With the wide application of neural network algorithms in more fields, the computational tasks in complex scenarios put higher performance requirements on traditional computing architectures. Reconfigurable structures are considered as ideal hardware implementation platforms for neural network algorithms, relying on their power efficiency, area efficiency and flexibility advantages. However, the implementation of such artificial intelligence algorithms on reconfigurable array structures suffers from low computational efficiency, long reconfiguration time, and poor flexibility. Based on the programmable dynamic self-reconfiguring 3D array chip architecture, the paper proposes a processor architecture based on array partitioning and the corresponding dynamic self-reconfiguration method. The structure can enhance the computational efficiency of reconfigurable array-oriented neural network algorithm while effectively enhancing the flexibility of dynamic self-reconfigurable array structure and realizing regional dynamic self-reconfiguration of array structure. Firstly, the gas pedal structures are designed for large-scale matrix operations, floating-point operations, and multiply-accumulate operations, which are common in artificial intelligence algorithms. These include a large-scale matrix multiplier, a floating-point multiplier based on the Karatsuba algorithm, and a distributed Vedic multiply-accumulator. The reconfigurable array processor can effectively accelerate the processing speed for such computational tasks while avoiding additional hardware overhead. FPGA test results show that the average rate of matrix processing in the array structure is 20.15% higher than that of the original structure with similar hardware overhead. Secondly, a processing element structure supporting array area reconfiguration is proposed, and modules such as read/write response and partition activation are designed; a five-level waterline structure adapted to neural network applications is proposed, and the corresponding instruction set structure is designed; matrix operations and special operation instructions required for software reconfiguration of neural network algorithms are added. The array processing elements are equipped with the ability of partitioned configuration, and the instruction execution time is reduced. Simulation results show that the instruction processing efficiency of the processing element structure supporting array region reconfiguration is improved by 9.45% compared with the original structure for large-scale matrix operation instruction test. Thirdly, the self-reconfiguration method based on dynamic partitioning of the array is proposed, and the key modules required for dynamic partitioning of the array are designed; the state collection method based on regional execution state feedback and the dynamic partitioning hierarchy configuration method are proposed, and the overall structure of the dynamic self-reconfiguration processor based on array partitioning is designed. FPGA test results show that the 16-partitioning mode reduces the state information transmission by 24.21% and saves 8.21% of the execution cycle consumption during reconfiguration. Finally, the prototype system of dynamic self-reconfiguration of array partitioning for neural network algorithm is developed, and the Transformer network compression method for reconfigurable array parallel computing architecture is studied; the layered mapping scheme and partitioning configuration mode of Transformer network coding and decoding layers are designed, and the VRP path planning platform is built to test the experimental results. The results show that ED-Transformer can achieve software reconstruction based on the dynamic self-reconfiguration structure of array partitioning, and the maximum average error of path planning is only 1.43%. |
参考文献: |
[5] 魏少军,李兆石,朱建峰,刘雷波.可重构计算:软件可定义的计算引擎[J].中国科学:信息科学,2020,50(09):1407-1426. [26] 李静晨,史豪斌,黄国胜.基于自注意力机制和策略映射重组的多智能体强化学习算法[J].计算机学报,2022,45(09):1842-1858. [28] 郭丹,姚沈涛,王辉等.嵌入局部聚类描述符的视频问答Transformer模型[J].计算机学报,2023,46(04):671-689. [54] 蒋林,贺飞龙,山蕊,王帅,吴皓月,武鑫.可重构视频阵列处理器测试平台设计与实现[J].系统仿真学报,2020,32(05):792-800. |
中图分类号: | TN492 |
开放日期: | 2024-06-19 |