- 无标题文档
查看论文信息

论文中文题名:

 基于可重构阵列的STT-MTJ存算一体结构研究与设计    

姓名:

 王欣    

学号:

 19206107029    

保密级别:

 保密(1年后开放)    

论文语种:

 chi    

学科代码:

 080903    

学科名称:

 工学 - 电子科学与技术(可授工学、理学学位) - 微电子学与固体电子学    

学生类型:

 硕士    

学位级别:

 工学硕士    

学位年度:

 2022    

培养单位:

 西安科技大学    

院系:

 电气与控制工程学院    

专业:

 电子科学与技术    

研究方向:

 集成电路设计    

第一导师姓名:

 蒋林    

第一导师单位:

 西安科技大学    

论文提交日期:

 2022-06-29    

论文答辩日期:

 2022-06-07    

论文外文题名:

 Research and design of STT-MTJ in-memory computing structure based on reconfigurable array    

论文中文关键词:

 自旋转移矩磁隧道结器件 ; 存算一体 ; 可重构结构 ; 阵列处理器 ; 卷积神经网络 ; 存储墙    

论文外文关键词:

 Spin transfer torque magnetic tunnel junction devices ; integrated storage and computing ; reconfigurable structure ; array processor ; convolutional neural network ; storage wall    

论文中文摘要:

      新型非易失性随机自旋转移矩磁隧道结(Spin Transfer Torque-Magnetic Tunnel Junction,STT-MTJ)的出现为存算一体体系架构的实现提供了有效途径。针对可重构阵列处理器,设计基于STT-MTJ的存算一体阵列结构成为解决可重构处理器“存储墙”问题的一种有效方法。基于STT-MTJ的存算一体阵列结构将部分计算搬移到数据存储阵列结构的内部,不仅能够支持传统存储器所具有的读写操作功能,还可以执行多种逻辑运算。通过减少处理器与存储之间的数据移动,增大可用带宽。

首先,在研究现有新型非易失性随机存储器建模方法的基础上,对STT-MTJ的结构特点进行研究。基于Sun物理器件模型建立STT-MTJ的行为级模型,包括STT-MTJ状态翻转机制。基于STT磁隧道结结构的物理方程,包括电阻、临界电流、翻转条件等物理方程,采用Verilog-A语言,建立可与CMOS电路在SPICE仿真器下联合仿真的行为级仿真模型。通过使用兼容SPICE仿真的AMS仿真器,验证STT-MTJ的P状态和AP状态,得到具体的翻转电流为115.88 μA的写‘1’电流和-12.23 μA的写‘0’电流。翻转电流数值与所需时间之间成正比,总结绘制STT-MTJ的R-I曲线,验证模型的磁滞效应。

      其次,基于构建的STT-MTJ行为级仿真模型,建立“1T1M”结构。搭建16×16“1T1M”的STT-MTJ存算一体阵列,完成相应的外围电路设计。通过设计模式选择模块实现计算模式和存储模式的切换。基于计算模式设计相应的逻辑运算电路,实现AND、OR、NOT、MUL、ADD五种逻辑计算。针对神经网络算法中形式单一的大量计算,提高计算效率。仿真结果表明,存算一体阵列能实现16 bit精度下AND、OR、NOT、MUL、ADD逻辑计算,分别可以达到2.053 TOPS/W、2.133 TOPS/W、2.479 TOPS/W、5.654 TOPS/W与 3.145 TOPS/W的能效。该结构相比于其它存算一体结构在提升计算精度上具有较大的优势。同时,精度的提升对于延时、算力和能效的影响较低。

      然后,论文面向可重构阵列处理器提出一种基于STT-MTJ可重构存算一体结构。通过编程指令实现对基于STT-MTJ的可重构存算一体结构的实时配置,完成收缩模式与卷积模式的切换,实现对整个存算一体结构的自适应调度。存算一体结构内部有专门的MAC寄存器,通过并行执行数据密集型计算,提高算法处理速度。实验结果表明,采用可重构存算一体阵列结构进行卷积计算减少了数据在可重构阵列PE和存储结构之间的频繁传输。与原始设计相比,采用存算一体结构进行3×3、5×5和11×11卷积计算的执行时间分别减少了13.95%、14.16%、14.26%。

      最后,对所提结构进行功能仿真与FPGA测试。首先,将STT-MTJ存算一体阵列进行地址编码,设计16 bit精度编码模式。仿真可重构存算一体结构指令的有效性,与其它STT-MTJ存算一体的结构实现逻辑计算的工作相比,论文设计的可重构STT-MTJ阵列可以支持16 bit精度的五种逻辑计算。在阵列结构相同、精度模式相同的情况下,论文能够支持的逻辑计算类型更多,且能效相对较高。其次,选取神经网络算法,通过映射AlexNet网络对所设计的可重构存算一体阵列结构与可重构处理器结构联合进行仿真与FPGA测试。结果表明,论文最高频率可达110 MHz,硬件资源消耗为120317 LUT,22992 Flip Flop。与其它文献相比,论文使用的硬件资源消耗减少46.9%。与不使用存算一体结构相比,性能平均提升19.93%。

论文外文摘要:

      The emergence of a new type of non-volatile random spin transfer torque magnetic tunnel junction (Spin Transfer Torque-Magnetic Tunnel Junction, STT-MTJ) provides an effective way to realize the memory-computing integrated architecture. For reconfigurable array processors, designing a storage-computing integrated array structure based on STT-MTJ has become an effective method to solve the "storage wall" problem of reconfigurable processors. The storage-computing integrated array structure based on STT-MTJ moves part of the calculation to the inside of the data storage array structure, which can not only support the read and write operation functions of traditional memory, but also perform a variety of logical operations. Increases available bandwidth by reducing data movement between processors and storage.

Firstly, the structural characteristics of STT-MTJ are studied on the basis of studying the existing new modeling methods of non-volatile random access memory. Based on the Sun physical device model, a behavioral model of the STT-MTJ is established, including the STT-MTJ state inversion mechanism. Based on the physical equations of the STT magnetic tunnel junction structure, including physical equations such as resistance, critical current, and inversion conditions, the Verilog-A language is used to establish a behavior-level simulation model that can be co-simulated with CMOS circuits under the SPICE simulator. By using the AMS simulator compatible with SPICE simulation, the P state and AP state of the STT-MTJ are verified, and the specific flip current is 115.88 μA write '1' current and -12.23 μA write '0' current. The value of the inversion current is proportional to the required time, and the R-I curve of the STT-MTJ is drawn to verify the hysteresis effect of the model.

      Secondly, based on the constructed STT-MTJ behavioral simulation model, a "1T1M" structure is established. Build a 16×16 "1T1M" STT-MTJ storage-computing integrated array, and complete the corresponding peripheral circuit design. The switching between computing mode and storage mode is realized through the design mode selection module. Based on the calculation mode, the corresponding logic operation circuit is designed to realize five logic calculations of AND, OR, NOT, MUL and ADD. For a large number of calculations in a single form in the neural network algorithm, the calculation efficiency is improved. The simulation results show that the integrated storage and calculation array can realize AND, OR, NOT, MUL, ADD logic calculation under 16-bit precision, which can reach 2.053 TOPS/W, 2.133 TOPS/W, 2.479 TOPS/W, 5.654 TOPS/W and 3.145 respectively. Energy efficiency of TOPS/W. Compared with other integrated storage and calculation structures, this structure has a great advantage in improving the calculation accuracy. At the same time, the improvement of accuracy has less impact on latency, computing power and energy efficiency.

      Then, the paper proposes a reconfigurable memory-computing integrated structure based on STT-MTJ for reconfigurable array processors. Real-time configuration of the reconfigurable storage-computing integrated structure based on STT-MTJ is realized through programming instructions, switching between contraction mode and convolution mode is completed, and adaptive scheduling of the entire storage-computing integrated structure is realized. There is a special MAC register inside the integrated storage and calculation structure, which improves the processing speed of the algorithm by executing data-intensive calculations in parallel. The experimental results show that the use of the reconfigurable storage-computing integrated array structure for convolution calculation reduces the frequent transmission of data between the reconfigurable array PE and the storage structure. Compared with the original design, the execution time of 3×3, 5×5, and 11×11 convolution computations using the storage-computing integrated structure is reduced by 13.95%, 14.16%, and 14.26%, respectively.

        Finally, functional simulation and FPGA testing of the proposed structure are carried out. First, the STT-MTJ storage and calculation integrated array is used for address encoding, and a 16-bit precision encoding mode is designed. Compared with other STT-MTJ integrated storage and calculation structure to realize the logic calculation work, the reconfigurable STT-MTJ array designed in this paper can support five kinds of logic with 16-bit precision. calculate. In the case of the same array structure and the same precision mode, the paper can support more types of logical calculations, and the energy efficiency is relatively high. Secondly, the neural network algorithm is selected, and the designed reconfigurable memory-computing integrated array structure and reconfigurable processor structure are jointly simulated and tested by FPGA by mapping the AlexNet network. The results show that the maximum frequency of the paper can reach 110 MHz, and the hardware resource consumption is 120317 LUTs and 22992 Flip Flops. Compared with other literatures, the hardware resource consumption used by the paper is reduced by 46.9%. Compared with not using the integrated storage and computing structure, the performance is improved by an average of 19.93%19.93%

参考文献:

[1]LeCun Y, Bengio Y, Hinton G. Deep learning[J]. nature, 2015, 521(7553): 436-444.

[2]Goodfellow I, Bengio Y, Courville A. Deep learning[M]. MIT press, 2016.

[3]Chi P, Li S, *u C, et al. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory[J]. ACM SIGARCH Computer Architecture News, 2016, 44(3): 27-39.

[4]毛海宇,舒继武,李飞等.内存计算研究进展[J].中国科学:信息科学,2021,51(02):173-205.

[5]Kautz W H. Cellular logic-in-memory arrays[J]. IEEE Transactions on Computers, 1969, 100(8): 719-727.

[6]Patterson D, Anderson T, Cardwell N, et al. Intelligent RAM (IRAM): Chips that remember and compute[C]//1997 IEEE International Solids-State Circuits Conference, Digest of Technical Papers. IEEE, 1997: 224-225.

[7]Wijtvliet M, Waeijen L, Corporaal H. Coarse grained reconfigurable architectures in the past 25 years: Overview and classification[C]//2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), IEEE, 2016: 235-244.

[8]Endoh T, Honjo H, Nishioka K, et al. Recent progresses in STT-MRAM and SOT-MRAM for next generation MRAM[C]//2020 IEEE Symposium on VLSI Technology. IEEE, 2020: 1-2

[9]Kim K, Shin H, Sim J, et al. An energy-efficient processing-in-memory architecture for long short term memory in spin orbit torque mram[C]//2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2019: 1-8.

[10]Huang J Y, Syu J L, Tsou Y T, et al. In-Memory Computing Architecture for a Convolutional Neural Network Based on Spin Orbit Torque MRAM[J]. Electronics, 2022, 11(8): 1245.

[11]Jin H, Liu C, Liu H, et al. ReHy: A ReRAM-based Digital/Analog Hybrid PIM Architecture for Accelerating CNN Training[J]. IEEE Transactions on Parallel and Distributed Systems, 2021: 1-1.

[12]Hadidi R, Asgari B, Mudassar B A, et al. Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube[C]//2017 IEEE International Symposium on Workload Characterization (IISWC), IEEE, 2017: 66-75.

[13]Zhang H, He J, Ko S B. Improved Hybrid Memory Cube for Weight-Sharing Deep Convolutional Neural Networks[C]//2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), IEEE, 2019: 122-126.

[14]Prajapati S , V Nehra, Kaushik B K. High Performance Computing-in-Memory Architecture based on Single and Multilevel Cell Differential Spin Hall MRAM[J]. IEEE Transactions on Magnetics, 2021, 57(9): 1-15.

[15]姚佳伦, 杨雨梦, 陈昊瑜. 用于存算一体的磁性随机存储器概述[J]. 功能材料与器件学报, 2021(6):11.

[16]Kim T, Jang Y, Kang M G, et al. SOT-MRAM Digital PIM Architecture with Extended Parallelism in Matrix Multiplication[J]. IEEE Transactions on Computers, 2022: 1-1.

[17]Cai H, Jiang H, Han M, et al. Pj-AxMTJ: Process-in-memory with Joint Magnetization Switching for Approximate Computing in Magnetic Tunnel Junction[C]// 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2019: 111-115.

[18]Chen W H, Khwa W S, Li J Y, et al. Circuit design for beyond von Neumann applications using emerging memory: From nonvolatile logics to neuromorphic computing[C]// 2017 18th International Symposium on Quality Electronic Design (ISQED). IEEE, 2017: 23-28

[19]Feng Y, Zhan , J Chen. Flash Memory based Computing-In-Memory to Solve Time-dependent Partial Differential Equations[C]// 2020 IEEE Silicon Nanoelectronics Workshop (SNW). IEEE, 2020: 27-28.

[20]H. -T. Lue, H. -W. Hu, T. -H. Hsu, P. -K. Hsu, K. -C. Wang and C. -Y. Lu, "Design of Computing-in-Memory (CIM) with Vertical Split-Gate Flash Memory for Deep Neural Network (DNN) Inference Accelerator,[C]// 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021,1-4.

[21]Mittal, Sparsh, Vetter, et al. A Survey of Software Techniques for Using Non-Volatile Memories for Storage and Main Memory Systems.[J]. IEEE Transactions on Parallel & Distributed Systems. 2016, 27(5):1537-1550.

[22]Zhang H, Huang B, Zhang Z, et al. On-chip Photonic Synapses Based on Slot-ridge Waveguides with PCMs for In-memory Computing[J]. IEEE Photonics Journal, 2021, 13(2): 1-13.

[23]He W, Yin S, Kim Y, et al. 2-Bit-per-Cell RRAM based In-Memory Computing for Area-/Energy-Efficient Deep Learning[J]. IEEE Solid-State Circuits Letters, 2020, 3: 194-197.

[24]Koike H, Tanigawa T, Watanabe T, et al. Review of STT-MRAM circuit design strategies, and a 40-nm 1T-1MTJ 128Mb STT-MRAM design practice[C]// 2020 IEEE 31st Magnetic Recording Conference (TMRC). IEEE, 2020: 1-2

[25]Luo T , Wang X , Qu C , et al. An FPGA-based Hardware Emulator for Neuromorphic Chip with RRAM[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018:1-1.

[26]Carro L, Carro L, Carro L, et al. Operand size reconfiguration for big data processing in memory[C]// Design, Automation & Test in Europe Conference & Exhibition.,2017.IEEE, 2017: 710-715.

[27]Andre T, Alam S M, Gogl D, et al. ST-MRAM Fundamentals, Challenges, and Outlook[C]// 2017 IEEE International Memory Workshop (IMW). IEEE, 2017.: 1-4.

[28]Wang P, Eken E, Zhang W, et al. A thermal and process variation aware MTJ switching model and its applications in soft error analysis[M]//More than Moore Technologies for Next Generation Computer Design. Springer, New York, NY, 2015: 101-125.

[29]Chun K C, Zhao H, Harms J D, et al. A scaling roadmap and performance evaluation of in-plane and perpendicular MTJ based STT-MRAMs for high-density cache memory[J]. IEEE journal of solid-state circuits, 2012, 48(2): 598-610.

[30]Harms J D, Ebrahimi F, Yao X, et al. SPICE macromodel of spin-torque-transfer-operated magnetic tunnel junctions[J]. IEEE transactions on electron devices, 2010, 57(6): 1425-1430.

[31]Xu Z, Sutaria K B, Yang C, et al. Compact modeling of STT-MTJ for SPICE simulation[C]//2013 Proceedings of the European Solid-State Device Research Conference (ESSDERC). IEEE, 2013: 338-341.

[32]Panagopoulos G D, Augustine C, Roy K. Physics-based SPICE-compatible compact model for simulating hybrid MTJ/CMOS circuits[J]. IEEE Transactions on Electron Devices, 2013, 60(9): 2808-2814.

[33]Kim J, Chen A, Behin-Aein B, et al. A technology-agnostic MTJ SPICE model with user-defined dimensions for STT-MRAM scalability studies[C]//2015 IEEE custom integrated circuits conference (CICC). IEEE, 2015: 1-4.

[34]Halawani Y, Mohammad B, Al-Qutayri M, et al. Modeling of STT-MTJ for low power embedded memory applications: A comparative review[C]//2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS). IEEE, 2013: 719-722.

[35]Shreya S, Jain A, Kaushik B K. Computing-in-Memory using Voltage-Controlled Spin-Orbit Torque based MRAM Array[J]. Microelectronics Journal, 2020, 109(1):104943.

[36]Wang J, Zhang Y, Lian C, et al. Efficient Time-Domain In-Memory Computing Based on TST-MRAM[C]// 2020 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2020.: 1-5.

[37]C. Wang, Z. Wang, Y. Zhang and W. Zhao, Computing-in-Memory Paradigm Based on STT-MRAM with Synergetic Read/Write-Like Modes[C]// IEEE International Symposium on Circuits and Systems (ISCAS), 2021,1-5.

[38]Wang C, Wang Z, Wang G, et al. Design of an Area-efficient Computing in Memory Platform based on STT-MRAM[J]. IEEE Transactions on Magnetics, 2020,57(2): 1-4.

[39]Nehra V, Prajapati S, Kumar T N, et al. High Performance Computing-in-Memory Architecture using STT-/SOT-based Series Triple-Level Cell MRAM[J]. IEEE Transactions on Magnetics, 57(8): 1-12.

[40]L. Zhang et al. A Robust Dual Reference Computing-in-Memory Implementation and Design Space Exploration Within STT-MRAM[C]// IEEE Computer Society Annual Symposium on VLSI (ISVLSI), .IEEE, 2018: 275-280.

[41]Lz A, Ed A, Hao C B, et al. A high-reliability and low-power computing-in-memory implementation within STT-MRAM - ScienceDirect[J]. Microelectronics Journal, 2018, 81:69-75.

[42]Fan D, Angizi S. Energy efficient in-memory binary deep neural network accelerator with dual-mode SOT-MRAM[C]//2017 IEEE International Conference on Computer Design (ICCD). IEEE, 2017: 609-612.

[43]Angizi S, He Z, Awad A, et al. MRIMA: An MRAM-Based In-Memory Accelerator[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020:1123-1136.

[44]Parveen F , Angizi S , He Z , et al. IMCS2: Novel Device-to-Architecture Co-Design for Low-Power In-Memory Computing Platform Using Coterminous Spin Switch[J]. IEEE Transactions on Magnetics, 2018:1-14.

[45]Parveen F, He Z, Angizi S, et al. HielM: Highly flexible in-memory computing using STT MRAM[C]// Asia & South Pacific Design Automation Conference. 2018:361-366.

[46]Pan Y, Jia X, Cheng Z, et al. An STT-MRAM based reconfigurable computing-in-memory architecture for general purpose computing[J]. CCF Transactions on High Performance Computing, 2020, 2(3): 272-281.

[47]T. C. Chang et al., 13.4 A 22nm 1Mb 1024b-Read and Near-Memory-Computing Dual-Mode STT-MRAM Macro with 42.6GB/s Read Bandwidth for Security-Aware Mobile Devices [C]// IEEE International Solid- State Circuits Conference - (ISSCC), 2020,. 224-226.

[48]Nehra V, Prajapati S, Kumar T N, et al. High-Performance Computing-in-Memory Architecture Using STT-/SOT-Based Series Triple-Level Cell MRAM[J]. IEEE Transactions on Magnetics, 2021, 57(8): 1-12.

[49]Monga K, Chaturvedi N, Gurunarayanan S. A Dual-Mode In-Memory Computing Unit Using Spin Hall-Assisted MRAM for Data-Intensive Applications[J]. IEEE Transactions on Magnetics, 2021, 57(4): 1-10.

[50]Angizi S, He Z, Parveen F, et al. Rimpa: A new reconfigurable dual-mode in-memory processing architecture with spin hall effect-driven domain wall motion device[C]//2017 IEEE Computer Society annual symposium on VLSI (ISVLSI). IEEE, 2017: 45-50.

[51]Pan Y, Jia X, Cheng Z, et al. An STT-MRAM based reconfigurable computing-in-memory architecture for general purpose computing[J]. CCF Transactions on High Performance Computing, 2020, 2(3): 272-281.

[52]Y. Shi, Y. Sun, J. Jiang, G. He, Q. Wang and N. Jing, Fast FPGA-Based Emulation for RRAM-Enabled Deep Neural Network Accelerator[C]// IEEE International Symposium on Circuits and Systems (ISCAS), 2021:1-5.

[53]Kim T, Jang Y, Kang M G, et al. SOT-MRAM Digital PIM Architecture with Extended Parallelism in Matrix Multiplication[J]. IEEE Transactions on Computers, 2022: 1-1.

[54]Prajapati S, Nehra V, Kaushik B K. High-Performance Computing-in-Memory Architecture Based on Single-Level and Multilevel Cell Differential Spin Hall MRAM[J]. IEEE Transactions on Magnetics, 2021, 57(9): 1-15.

[55]Wang C, Wang Z, Wang G, et al. Design of an area-efficient computing in memory platform based on STT-MRAM[J]. IEEE Transactions on Magnetics, 2020, 57(2): 1-4.

[56]Bishnoi R, Ebrahimi M,Oboril F, et al. Improving Write Performance for STT-MRAM[J]. IEEE Transactions on Magnetics, 2016, 52(8):1-11.

[57]Parveen F , Angizi S , Fan D . IMFlexCom: Energy efficient in-memory flexible computing using dual-mode stt-MRAM[J]. ACM Journal on Emerging Technologies in Computing Systems, 2018, 14(3):1-18.

[58]G. Korol, F. G. Moraes, A FPGA Parameterizable Multi-Layer Architecture for CNNs[C]// 2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI), Sao Paulo, Brazil, IEEE, 2019,. 1-6.

[59]C. Zhang, G. Sun, Z. Fang, P. Zhou, P. Pan and J. Cong. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38(11): 2072-2085.

[60]Jiang L, Wang X J, Liu Z T, et al. Design and implementation of convolutional neural network based on FPGA[J]. Microelectronics and Computer, 2018, 35(08):138-142.

[61]Shan R, Jiang L, Deng J Y, et al. Parallel design of convolutional neural networks for remote sensing images object recognition based on data-driven array processor[J]. The Journal of China Universities of Posts and Telecommunications, 2020, 27(06): 87-100.

[62]G. Korol, F. G. Moraes, A FPGA Parameterizable Multi-Layer Architecture for CNNs[C]// 2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI), Sao Paulo, Brazil, 2019: 1-6.

[63].Wu C B, Wang C S, Hsiao Y K. Reconfigurable Hardware Architecture Design and Implementation for AI Deep Learning Accelerator[C]// 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE), Kobe, Japan, 2020: 154-155.

[64]Zhang C,Sun G, Fang Z, et al. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38(11): 2072-2085.

[65]Yuan Z, Liu Y, Yue J, et al. CORAL: Coarse-grained reconfigurable architecture for Convolutional Neural Networks[C]//2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Taipei, Taiwan, 2017: 1-6.

中图分类号:

 TN492    

开放日期:

 2023-06-29    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式