查看论文信息

题名：	基于可重构阵列的SRAM存内计算结构研究与设计
作者：	黎瑞金
学号：	18206038026
保密级别：	保密（4年后开放）
语种：	chi
学科代码：	080903
学科：	工学 - 电子科学与技术（可授工学、理学学位） - 微电子学与固体电子学
学生类型：	硕士
学位：	工学硕士
学位年度：	2021
学校：	西安科技大学
院系：	电气与控制工程学院
专业：	微电子学与固体电子学
研究方向：	集成电路设计
导师姓名：	蒋林
导师单位：	西安科技大学
提交日期：	2022-02-28
答辩日期：	2021-12-03
外文题名：	Research and Design of SRAM In-Memory Computing Structure Based on Reconfigurable Array
关键词：	可重构计算 ; 存内计算 ; 阵列处理器 ; 静态随机存取存储器 ; 卷积神经网络 ; 计算机体系结构
外文关键词：	Reconfigurable Computing ; Computing-in-Memory ; Array Processor ; Static Random Access Memory ; Computer Architecture
摘要：	︿随着人工智能应用技术的快速发展，传统冯诺依曼体系结构下高计算密度和内存带宽不足之间的差距日益恶化。为了缓解这一问题，大规模处理系统正在从以计算为中心转向以数据为中心的模型。此外，可重构阵列处理器以其功能的灵活性和计算的高效性，逐渐成为实现新型应用高效能计算的一个具有发展前景的研究方向。针对可重构阵列处理器中日益严重的“存储墙”问题，存内计算电路不仅可以支持存储器电路所具有的一般读写操作，并且可以执行多种运算操作，有效减少数据的搬移量，从而进一步降低系统的能耗。新型存储器及存内计算电路在高能效人工智能处理器、物联网终端设备、智能家居和智慧城市系统中有着广泛的应用前景，值得不断地深入研究。首先，根据现有的8T SRAM单元具有独立读写位线的特点，分析了8T SRAM单元的读写策略，提出了NOR逻辑结构、NAND逻辑结构与XOR逻辑结构电路设计方案。通过合理设计反相器与偏斜门的逻辑努力（Logical Effort），在同时打开SRAM结构双单元传输管的情况下，使得读位线电压发生变化，实现了NOR、NAND以及XOR逻辑运算。仿真结果表明，8T SRAM存内计算单元结构既可以进行常规读写操作，又可以准确实现NOR逻辑、NAND逻辑与XOR逻辑等运算。其次，基于8T SRAM单元对电路进行扩展，搭建4×4结构的8T SRAM存内计算阵列，并设计实现了外围控制电路。使得8T SRAM阵列结构具备传统SRAM存储器所具有的读写操作功能，以及计算单元具有的部分逻辑运算功能。针对神经网络算法，设计并实现了基于阵列乘法器的乘法方案，该方案通过阵列内部实现的与非操作替换乘法器中的与门以加速乘法运算，从而提高计算效率。仿真结果表明，在1 bit精度下阵列实现NOR、NAND以及XOR逻辑分别可以达到8.437 TOPS/W、20.787 TOPS/W与0.066 TOPS/W的能耗效率。在4 bit精度下阵列分别可以达到2.897 TOPS/W、6.989 TOPS/W与0.019 TOPS/W的能耗效率。该结构相比于其他存内计算结构在提升多精度计算上具有较大的优势。同时，精度的提升对于延时、算力和能耗效率的影响较低。然后，为了验证8T SRAM存内计算阵列在可重构阵列结构中的适用性与有效性，设计实现了可重构SRAM存内计算结构并进行仿真验证。该结构由8T SRAM存内计算阵列的抽象模型——CRAM单元，与可重构存内计算协处理器组成，通过自定义指令实现与可重构阵列处理器资源的有效调用。仿真结果表明，针对卷积运算设计的MAC指令可以在存储器内运算，减少了处理器与存储器间的数据传输。最后，为了验证可重构存内计算结构的可行性，将8T SRAM存内计算阵列扩展至256 bit并进行地址编码，加入16 bit精度模式。分别针对1 bit、4 bit与16 bit精度模式，仿真可重构存内计算协处理器指令的有效性，并设计可重构SRAM存内计算结构FPGA测试方案。仿真与实验结果表明，可重构存内计算阵列在1 bit、4 bit与16 bit精度下执行MAC指令的能耗效率分别为15.59 TOPS/W、8.91 TOPS/W与3.32 TOPS/W。采用可重构SRAM存内计算结构可以使得处理器执行单个3×3、5×5和11×11卷积计算的速度分别提升13.39%、27.39%和39.77%。﹀
外文摘要：	︿ With the rapid development of artificial intelligence application technology, the gap between high computational density and insufficient memory bandwidth under the traditional von Neumann architecture is worsening. To mitigate this problem, large-scale processing systems are moving from computation-centric to data-centric models. In addition, reconfigurable array processor has gradually become a promising research direction to realize high-efficiency computing for new applications due to its flexibility and high computation efficiency. In view of the increasingly serious "memory wall" problem in reconfigurable array processors, the memory computing circuit can not only support the common reading and writing operations of the memory circuit, but also perform a variety of operations, which can effectively reduce the amount of data transfer and further reduce the energy consumption of the system. New memory and in-memory computing circuits have wide application prospects in energy-efficient artificial intelligence processors, Internet of Things terminal devices, smart home, smart city systems. So new memory and in-memory computing circuits deserve further research. First, according to the characteristics of the existing 8T SRAM unit with independent reading and writing bit lines, the reading and writing strategy of 8T SRAM unit are analyzed, the circuit design scheme of NOR logic structure, NAND logic structure and XOR logic structure are proposed. Through the Logical Effort of rational design of inverter and deflection gate, the reading bit line voltage changes when the SRAM structure double unit is turned on at the same time, and the LOGIC operation of NOR, NAND and XOR are realized. The experimental results show that the structure of 8T SRAM memory computing unit can not only perform conventional reading and writing operations, but also accurately implement NOR logic, NAND logic and XOR logic operations. Second, based on the 8T SRAM structure, the circuit is extended. A 4×4 structure 8T SRAM in-memory computing array is built, the peripheral control circuit is designed and implemented. The 8T SRAM array structure has the reading and writing operation function of traditional SRAM memory and some logical operation functions of computing unit. Aiming at the neural network algorithm, a multiplication scheme based on array multiplier is designed and implemented, which can accelerate the multiplication operation by replacing the AND gate in the multiplier as implemented in the array, thus improving the calculation efficiency. The experimental results show that the energy efficiency of NOR, NAND and XOR logic can reach 8.437 TOPS/W, 20.787 TOPS/W and 0.066 TOPS/W at 1 bit precision. The energy efficiency of the array can reach 2.897 TOPS/W, 6.989 TOPS/W and 0.019 TOPS/W respectively under 4-bit precision. Compared with other memory computing structures, this structure has great advantages in improving multi-precision computing. At the same time, the improvement of accuracy has a low impact on the delay, computing power and energy consumption efficiency. Then, in order to verify the applicability and effectiveness of the 8T SRAM in-memory computing array in the reconfigurable array structure, a reconfigurable SRAM in-memory computing structure is designed and implemented for simulation verification. This structure is composed of 8T SRAM in-memory computing array and reconfigurable in-memory computing co-processor, and the effective invocation of processor resources in the reconfigurable array is realized through custom instructions. The simulation results show that the MAC instruction designed for convolution operation can be operated in the memory, which reduces the data transmission between the processor and the memory. Finally, in order to verify the feasibility of the reconfigurable in-memory calculation structure, the 8T SRAM in-memory calculation array is expanded to 256 bits, the address is coded, and a 16-bit precision mode is added. For the 1 bit, 4 bit and 16 bit precision modes, the effectiveness of the reconfigurable in-memory calculation coprocessor instructions is simulated, and the FPGA test plan for the reconfigurable SRAM in-memory calculation structure is designed. The simulation and experimental results show that the energy consumption efficiency of the reconfigurable in-memory calculation array executing MAC instructions at 1 bit, 4 bit and 16 bit precision are 15.59 TOPS/W, 8.91 TOPS/W and 3.32 TOPS/W. The use of a reconfigurable SRAM in-memory calculation structure can increase the speed of the processor to perform a single 3 × 3, 5 × 5, and 11 × 11 convolution calculation by 13.39%, 27.39%, and 39.77%. ﹀
参考文献：	︿ [1] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. [2] Goodfellow I, Bengio Y, Courville A. Deep learning[M]. London: The MIT Press, 2016: 519-542. [3] Chi P, Li S, Xu C, et al. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory[J]. ACM SIGARCH Computer Architecture News, 2016, 44(3): 27-39. [4] Tang C, Liu D, Xing Z, et al. Memory access analysis of many-core system with abundant bandwidth[C]//Proceedings of the 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip. Turin, Italy: IEEE, 2015: 187-194. [5] 魏少军, 刘雷波, 尹首一. 可重构计算处理器技术[J]. 中国科学: 信息科学, 2013, 42(12): 1559-1576. [6] Wijtvliet M, Waeijen L, Corporaal H. Coarse grained reconfigurable architectures in the past 25 years: Overview and classification[C]//Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS). Island, Greece: IEEE, 2016: 235-244. [7] 毛海宇, 舒继武, 李飞等. 内存计算研究进展[J].中国科学: 信息科学, 2021, 51(02): 173-205. [8] Kautz W. Cellular logic-in-memory arrays[J]. IEEE Transactions on Computers, 1969, 100(8): 719-727. [9] Patterson D, Anderson T, Cardwell N, et al. Intelligent RAM (IRAM): Chips that remember and compute[C]// Proceedings of the International Solids-State Circuits Conference, Digest of Technical Papers. San Francisco, USA: IEEE, 1997: 224-225. [10] Ibrahim I, Bassiouni M. Improvement of data throughput in data-intensive cloud computing applications[C]// Proceedings of the 5th International Conference on Big Data Computing Service and Applications (BigDataService). Fuzhou, China: IEEE, 2019: 49-54. [11] Shafiee A, Nag A, Muralimanohar N, et al. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars[J]. ACM SIGARCH Computer Architecture News, 2016, 44(3): 14-26. [12] Chen Y, Krishna T, Emer J, et al. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks[J]. IEEE journal of solid-state circuits, 2016, 52(1): 127-138. [13] Song L, Qian X, Li H, et al. Pipelayer: A pipelined reram-based accelerator for deep learning[C]// Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). Orlando, USA: IEEE, 2017: 541-552. [14] Li H, Gao B, Chen Z, et al. A learnable parallel processing architecture towards unity of memory and computing[J]. Scientific reports, 2015, 5(1): 1-8. [15] Hadidi R, Asgari B, Mudassar B A, et al. Demystifying the characteristics of 3D-stacked memories: A case study for hybrid memory cube[C]//Proceedings of the International Symposium on Workload Characterization (IISWC). Seattle, USA: IEEE, 2017: 66-75. [16] Zhang H, He J, Ko S. Improved hybrid memory cube for weight-sharing deep convolutional neural networks[C]//Proceedings of the International Conference on Artificial Intelligence Circuits and Systems (AICAS). Taiwan, China: IEEE, 2019: 122-126. [17] Jeon D, Park K, Chung K. HMC-MAC: Processing-in memory architecture for multiply-accumulate operations with hybrid memory cube[J]. IEEE Computer Architecture Letters, 2017, 17(1): 5-8. [18] Tomé D, Santos P, Carro L, et al. HIPE: HMC instruction predication extension applied on database processing[C]//Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE). Dresden, Germany: IEEE, 2018: 261-264. [19] Alves M, Diener M, Santos P, et al. Large vector extensions inside the HMC[C]//Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE). Dresden, Germany: IEEE, 2016: 1249-1254. [20] Chi P, Li S, Xu C, et al. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory[J]. ACM SIGARCH Computer Architecture News, 2016, 44(3): 27-39. [21] Zhang X, Zhang Q, Yang J, et al. Novel Hybrid Computing Architecture with Memristor-Based Processing-in-Memory for Data-Intensive Applications[C]//Proceedings of the 14th International Conference on Solid-State and Integrated Circuit Technology (ICSICT). Qingdao, China: IEEE, 2018: 1-3. [22] Long Y, Kim D, Lee E, et al. A Ferroelectric FET-based processing-in-memory architecture for DNN acceleration[J]. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, 2019, 5(2): 113-122. [23] Kang M, Gonugondla S, Shanbhag N. Deep in-memory architectures in SRAM: An analog approach to approximate computing[J]. Proceedings of the IEEE, 2020, 108(12): 2251-2275. [24] Chen J, Zhao W, Wang Y, et al. Analysis and optimization strategies toward reliable and high-speed 6T compute SRAM[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2021, 68(4): 1520-1531. [25] Su J, Chou Y, Liu R, et al. 16.3 A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips[C]//Proceedings of the International Solid-State Circuits Conference (ISSCC). San Francisco, USA: IEEE, 2021: 250-252. [26] Ali M, Jaiswal A, Kodge S, et al. IMAC: In-memory multi-bit multiplication and ACcumulation in 6T SRAM array[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2020, 67(8): 2521-2531. [27] Chen Z, Yu Z, Jin Q, et al. CAP-RAM: A charge-domain in-memory computing 8T-SRAM for accurate and precision-programmable CNN inference[J]. IEEE Journal of Solid-State Circuits, 2021, 56(6): 1924-1935. [28] Huang S, Jiang H, Peng X, et al. XOR-CIM: compute-in-memory SRAM architecture with embedded XOR encryption[C]//Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD). San Diego, USA: IEEE, 2020: 1-6. [29] Agrawal A, Kosta A, Kodge S, et al. Cash-ram: Enabling in-memory computations for edge inference using charge accumulation and sharing in standard 8T-SRAM arrays[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2020, 10(3): 295-305. [30] Jiang H, Liu R, Yu S. 8T XNOR-SRAM based parallel compute-in-memory for deep neural network accelerator[C]//Proceedings of the IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS). Springfield, USA: IEEE, 2020: 257-260. [31] Lin Z, Zhan H, Li X, et al. In-memory computing with double word lines and three read ports for four operands[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020, 28(5): 1316-1320. [32] Jaiswal A, Agrawal A, Ali M, et al. i-SRAM: Interleaved wordlines for vector boolean operations using srams[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2020, 67(12): 4651-4659. [33] Lee C, Hsu Y, Liu T, et al. Design of an 45nm NCFET Based compute-in-SRAM for energy-efficient machine learning applications[C]//Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). Ha Long, Vietnam: IEEE, 2020: 193-196. [34] Surana N, Lavania M, Barma A, et al. Robust and high-performance 12-T interlocked SRAM for in-memory computing[C]//Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE). Grenoble, France: IEEE, 2020: 1323-1326. [35] Agrawal A, Jaiswal A, Lee C, et al. X-sram: Enabling in-memory boolean computations in cmos static random access memories[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2018, 65(12): 4219-4232. [36] Sebastian A, Tuma T, Papandreou N, et al. Temporal correlation detection using computational phase-change memory[J]. Nature Communications, 2017, 8(1): 1-10. [37] Zhang J, Wang Z, Verma N. In-memory computation of a machine-learning classifier in a standard 6T SRAM array[J]. IEEE Journal of Solid-State Circuits, 2017, 52(4): 915-924. [38] Biswas A, Chandrakasan A P. CONV-SRAM: An energy-efficient SRAM with in-memory dot-product computation for low-power convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2019, 54(1): 217-230. [39] Jaiswal A, Chakraborty I, Agrawal A, et al. 8T SRAM cell as a multibit dot-product engine for beyond von Neumann computing[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019, 27(11): 2556-2567. [40] Si X, Chen J, Tu Y, et al. A twin-8T SRAM computation-in-memory unit-macro for multibit CNN-based AI edge processors[J]. IEEE Journal of Solid-State Circuits, 2019, 55(1): 189-202. [41] Zhang J, He Y, Wu X, et al. A disturb-free 10T SRAM cell with high read stability and write ability for ultra-low voltage operations[C]//Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). Chengdu, China: IEEE, 2018: 305-308. [42] Sun J, Jiao H. A 12T low-power standard-cell based SRAM circuit for ultra-low-voltage operations[C]//Proceedings of the International Conference on IC Design and Technology (ICICDT). Suzhou, China: IEEE, 2019: 1-4. [43] Gupta S, Gupta K, Calhoun B, et al. Low-power near-threshold 10T SRAM bit cells with enhanced data-independent read port leakage for array augmentation in 32-nm CMOS[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2019, 66(3): 978-988. [44] Agrawal A, Jaiswal A, Roy D, et al. Xcel-RAM: Accelerating binary neural networks in high-throughput SRAM compute arrays[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2019, 66(8): 3064-3076. [45] Srinivasa S, Ramanathan A, Sundaram J, et al. Trends and opportunities for SRAM based in-memory and near-memory computation[C]//Proceedings of the 22nd International Symposium on Quality Electronic Design (ISQED). Santa Clara, USA: IEEE, 2021: 547-552. [46] Jhang C, Xue C, Hung J, et al. Challenges and trends of SRAM-based computing-in-memory for AI edge devices[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2021, 68(5): 1773-1786. [47] Sehgal R, Kulkarni J. Trends in analog and digital intensive compute-in-SRAM designs[C]//Proceedings of the IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS). Washington DC, USA: IEEE, 2021: 1-4. [48] Zeng J, Zhang Z, Chen R, et al. DM-IMCA: A dual-mode in-memory computing architecture for general purpose processing[J]. IEICE Electronics Express, 2020, 17(4): 1-6. [49] Kim H, Yoo T, Kim T T H, et al. Colonnade: A reconfigurable SRAM-based digital bit-serial compute-in-memory macro for processing neural networks[J]. IEEE Journal of Solid-State Circuits, 2021, 56(7): 2221-2233. [50] Lee K, Jeong J, Cheon S, et al. Bit parallel 6T SRAM in-memory computing with reconfigurable bit-precision[C]//Proceedings of the 57th ACM/IEEE Design Automation Conference (DAC). San Francisco, USA: IEEE, 2020: 1-6. [51] Xie S, Ni C, Sayal A, et al. 16.2 eDRAM-CIM: Compute-in-memory design with reconfigurable embedded-dynamic-memory array realizing adaptive data converters and charge-domain computing[C]//Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). San Francisco, USA: IEEE, 2021: 248-250. [52] de Lima J P C, Brandalero M, Carro L. Endurance-aware RRAM-based reconfigurable architecture using TCAM arrays[C]// Proceedings of the 30th International Conference on Field-Programmable Logic and Applications (FPL). Gothenburg, Sweden: IEEE, 2020: 40-46. ﹀
中图分类号：	TN492
开放日期：	2026-02-27

附件下载