论文中文题名: | 基于可重构阵列的STT-MTJ存算一体结构研究与设计 |
姓名: | |
学号: | 19206107029 |
保密级别: | 保密(1年后开放) |
论文语种: | chi |
学科代码: | 080903 |
学科名称: | 工学 - 电子科学与技术(可授工学、理学学位) - 微电子学与固体电子学 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2022 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 集成电路设计 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2022-06-29 |
论文答辩日期: | 2022-06-07 |
论文外文题名: | Research and design of STT-MTJ in-memory computing structure based on reconfigurable array |
论文中文关键词: | |
论文外文关键词: | Spin transfer torque magnetic tunnel junction devices ; integrated storage and computing ; reconfigurable structure ; array processor ; convolutional neural network ; storage wall |
论文中文摘要: |
新型非易失性随机自旋转移矩磁隧道结(Spin Transfer Torque-Magnetic Tunnel Junction,STT-MTJ)的出现为存算一体体系架构的实现提供了有效途径。针对可重构阵列处理器,设计基于STT-MTJ的存算一体阵列结构成为解决可重构处理器“存储墙”问题的一种有效方法。基于STT-MTJ的存算一体阵列结构将部分计算搬移到数据存储阵列结构的内部,不仅能够支持传统存储器所具有的读写操作功能,还可以执行多种逻辑运算。通过减少处理器与存储之间的数据移动,增大可用带宽。 首先,在研究现有新型非易失性随机存储器建模方法的基础上,对STT-MTJ的结构特点进行研究。基于Sun物理器件模型建立STT-MTJ的行为级模型,包括STT-MTJ状态翻转机制。基于STT磁隧道结结构的物理方程,包括电阻、临界电流、翻转条件等物理方程,采用Verilog-A语言,建立可与CMOS电路在SPICE仿真器下联合仿真的行为级仿真模型。通过使用兼容SPICE仿真的AMS仿真器,验证STT-MTJ的P状态和AP状态,得到具体的翻转电流为115.88 μA的写‘1’电流和-12.23 μA的写‘0’电流。翻转电流数值与所需时间之间成正比,总结绘制STT-MTJ的R-I曲线,验证模型的磁滞效应。 其次,基于构建的STT-MTJ行为级仿真模型,建立“1T1M”结构。搭建16×16“1T1M”的STT-MTJ存算一体阵列,完成相应的外围电路设计。通过设计模式选择模块实现计算模式和存储模式的切换。基于计算模式设计相应的逻辑运算电路,实现AND、OR、NOT、MUL、ADD五种逻辑计算。针对神经网络算法中形式单一的大量计算,提高计算效率。仿真结果表明,存算一体阵列能实现16 bit精度下AND、OR、NOT、MUL、ADD逻辑计算,分别可以达到2.053 TOPS/W、2.133 TOPS/W、2.479 TOPS/W、5.654 TOPS/W与 3.145 TOPS/W的能效。该结构相比于其它存算一体结构在提升计算精度上具有较大的优势。同时,精度的提升对于延时、算力和能效的影响较低。 然后,论文面向可重构阵列处理器提出一种基于STT-MTJ可重构存算一体结构。通过编程指令实现对基于STT-MTJ的可重构存算一体结构的实时配置,完成收缩模式与卷积模式的切换,实现对整个存算一体结构的自适应调度。存算一体结构内部有专门的MAC寄存器,通过并行执行数据密集型计算,提高算法处理速度。实验结果表明,采用可重构存算一体阵列结构进行卷积计算减少了数据在可重构阵列PE和存储结构之间的频繁传输。与原始设计相比,采用存算一体结构进行3×3、5×5和11×11卷积计算的执行时间分别减少了13.95%、14.16%、14.26%。 最后,对所提结构进行功能仿真与FPGA测试。首先,将STT-MTJ存算一体阵列进行地址编码,设计16 bit精度编码模式。仿真可重构存算一体结构指令的有效性,与其它STT-MTJ存算一体的结构实现逻辑计算的工作相比,论文设计的可重构STT-MTJ阵列可以支持16 bit精度的五种逻辑计算。在阵列结构相同、精度模式相同的情况下,论文能够支持的逻辑计算类型更多,且能效相对较高。其次,选取神经网络算法,通过映射AlexNet网络对所设计的可重构存算一体阵列结构与可重构处理器结构联合进行仿真与FPGA测试。结果表明,论文最高频率可达110 MHz,硬件资源消耗为120317 LUT,22992 Flip Flop。与其它文献相比,论文使用的硬件资源消耗减少46.9%。与不使用存算一体结构相比,性能平均提升19.93%。 |
论文外文摘要: |
The emergence of a new type of non-volatile random spin transfer torque magnetic tunnel junction (Spin Transfer Torque-Magnetic Tunnel Junction, STT-MTJ) provides an effective way to realize the memory-computing integrated architecture. For reconfigurable array processors, designing a storage-computing integrated array structure based on STT-MTJ has become an effective method to solve the "storage wall" problem of reconfigurable processors. The storage-computing integrated array structure based on STT-MTJ moves part of the calculation to the inside of the data storage array structure, which can not only support the read and write operation functions of traditional memory, but also perform a variety of logical operations. Increases available bandwidth by reducing data movement between processors and storage. Firstly, the structural characteristics of STT-MTJ are studied on the basis of studying the existing new modeling methods of non-volatile random access memory. Based on the Sun physical device model, a behavioral model of the STT-MTJ is established, including the STT-MTJ state inversion mechanism. Based on the physical equations of the STT magnetic tunnel junction structure, including physical equations such as resistance, critical current, and inversion conditions, the Verilog-A language is used to establish a behavior-level simulation model that can be co-simulated with CMOS circuits under the SPICE simulator. By using the AMS simulator compatible with SPICE simulation, the P state and AP state of the STT-MTJ are verified, and the specific flip current is 115.88 μA write '1' current and -12.23 μA write '0' current. The value of the inversion current is proportional to the required time, and the R-I curve of the STT-MTJ is drawn to verify the hysteresis effect of the model. Secondly, based on the constructed STT-MTJ behavioral simulation model, a "1T1M" structure is established. Build a 16×16 "1T1M" STT-MTJ storage-computing integrated array, and complete the corresponding peripheral circuit design. The switching between computing mode and storage mode is realized through the design mode selection module. Based on the calculation mode, the corresponding logic operation circuit is designed to realize five logic calculations of AND, OR, NOT, MUL and ADD. For a large number of calculations in a single form in the neural network algorithm, the calculation efficiency is improved. The simulation results show that the integrated storage and calculation array can realize AND, OR, NOT, MUL, ADD logic calculation under 16-bit precision, which can reach 2.053 TOPS/W, 2.133 TOPS/W, 2.479 TOPS/W, 5.654 TOPS/W and 3.145 respectively. Energy efficiency of TOPS/W. Compared with other integrated storage and calculation structures, this structure has a great advantage in improving the calculation accuracy. At the same time, the improvement of accuracy has less impact on latency, computing power and energy efficiency. Then, the paper proposes a reconfigurable memory-computing integrated structure based on STT-MTJ for reconfigurable array processors. Real-time configuration of the reconfigurable storage-computing integrated structure based on STT-MTJ is realized through programming instructions, switching between contraction mode and convolution mode is completed, and adaptive scheduling of the entire storage-computing integrated structure is realized. There is a special MAC register inside the integrated storage and calculation structure, which improves the processing speed of the algorithm by executing data-intensive calculations in parallel. The experimental results show that the use of the reconfigurable storage-computing integrated array structure for convolution calculation reduces the frequent transmission of data between the reconfigurable array PE and the storage structure. Compared with the original design, the execution time of 3×3, 5×5, and 11×11 convolution computations using the storage-computing integrated structure is reduced by 13.95%, 14.16%, and 14.26%, respectively. Finally, functional simulation and FPGA testing of the proposed structure are carried out. First, the STT-MTJ storage and calculation integrated array is used for address encoding, and a 16-bit precision encoding mode is designed. Compared with other STT-MTJ integrated storage and calculation structure to realize the logic calculation work, the reconfigurable STT-MTJ array designed in this paper can support five kinds of logic with 16-bit precision. calculate. In the case of the same array structure and the same precision mode, the paper can support more types of logical calculations, and the energy efficiency is relatively high. Secondly, the neural network algorithm is selected, and the designed reconfigurable memory-computing integrated array structure and reconfigurable processor structure are jointly simulated and tested by FPGA by mapping the AlexNet network. The results show that the maximum frequency of the paper can reach 110 MHz, and the hardware resource consumption is 120317 LUTs and 22992 Flip Flops. Compared with other literatures, the hardware resource consumption used by the paper is reduced by 46.9%. Compared with not using the integrated storage and computing structure, the performance is improved by an average of 19.93%19.93% |
参考文献: |
[1]LeCun Y, Bengio Y, Hinton G. Deep learning[J]. nature, 2015, 521(7553): 436-444. [2]Goodfellow I, Bengio Y, Courville A. Deep learning[M]. MIT press, 2016. [4]毛海宇,舒继武,李飞等.内存计算研究进展[J].中国科学:信息科学,2021,51(02):173-205. [15]姚佳伦, 杨雨梦, 陈昊瑜. 用于存算一体的磁性随机存储器概述[J]. 功能材料与器件学报, 2021(6):11. |
中图分类号: | TN492 |
开放日期: | 2023-06-29 |