题名: | 基于可重构阵列的SRAM存内计算结构研究与设计 |
作者: | |
学号: | 18206038026 |
保密级别: | 保密(4年后开放) |
语种: | chi |
学科代码: | 080903 |
学科: | 工学 - 电子科学与技术(可授工学、理学学位) - 微电子学与固体电子学 |
学生类型: | 硕士 |
学位: | 工学硕士 |
学位年度: | 2021 |
学校: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 集成电路设计 |
导师姓名: | |
导师单位: | |
提交日期: | 2022-02-28 |
答辩日期: | 2021-12-03 |
外文题名: | Research and Design of SRAM In-Memory Computing Structure Based on Reconfigurable Array |
关键词: | |
外文关键词: | Reconfigurable Computing ; Computing-in-Memory ; Array Processor ; Static Random Access Memory ; Computer Architecture |
摘要: |
随着人工智能应用技术的快速发展,传统冯诺依曼体系结构下高计算密度和内存带宽不足之间的差距日益恶化。为了缓解这一问题,大规模处理系统正在从以计算为中心转向以数据为中心的模型。此外,可重构阵列处理器以其功能的灵活性和计算的高效性,逐渐成为实现新型应用高效能计算的一个具有发展前景的研究方向。针对可重构阵列处理器中日益严重的“存储墙”问题,存内计算电路不仅可以支持存储器电路所具有的一般读写操作,并且可以执行多种运算操作,有效减少数据的搬移量,从而进一步降低系统的能耗。新型存储器及存内计算电路在高能效人工智能处理器、物联网终端设备、智能家居和智慧城市系统中有着广泛的应用前景,值得不断地深入研究。 首先,根据现有的8T SRAM单元具有独立读写位线的特点,分析了8T SRAM单元的读写策略,提出了NOR逻辑结构、NAND逻辑结构与XOR逻辑结构电路设计方案。通过合理设计反相器与偏斜门的逻辑努力(Logical Effort),在同时打开SRAM结构双单元传输管的情况下,使得读位线电压发生变化,实现了NOR、NAND以及XOR逻辑运算。仿真结果表明,8T SRAM存内计算单元结构既可以进行常规读写操作,又可以准确实现NOR逻辑、NAND逻辑与XOR逻辑等运算。 其次,基于8T SRAM单元对电路进行扩展,搭建4×4结构的8T SRAM存内计算阵列,并设计实现了外围控制电路。使得8T SRAM阵列结构具备传统SRAM存储器所具有的读写操作功能,以及计算单元具有的部分逻辑运算功能。针对神经网络算法,设计并实现了基于阵列乘法器的乘法方案,该方案通过阵列内部实现的与非操作替换乘法器中的与门以加速乘法运算,从而提高计算效率。仿真结果表明,在1 bit精度下阵列实现NOR、NAND以及XOR逻辑分别可以达到8.437 TOPS/W、20.787 TOPS/W与0.066 TOPS/W的能耗效率。在4 bit精度下阵列分别可以达到2.897 TOPS/W、6.989 TOPS/W与0.019 TOPS/W的能耗效率。该结构相比于其他存内计算结构在提升多精度计算上具有较大的优势。同时,精度的提升对于延时、算力和能耗效率的影响较低。 然后,为了验证8T SRAM存内计算阵列在可重构阵列结构中的适用性与有效性,设计实现了可重构SRAM存内计算结构并进行仿真验证。该结构由8T SRAM存内计算阵列的抽象模型——CRAM单元,与可重构存内计算协处理器组成,通过自定义指令实现与可重构阵列处理器资源的有效调用。仿真结果表明,针对卷积运算设计的MAC指令可以在存储器内运算,减少了处理器与存储器间的数据传输。 最后,为了验证可重构存内计算结构的可行性,将8T SRAM存内计算阵列扩展至256 bit并进行地址编码,加入16 bit精度模式。分别针对1 bit、4 bit与16 bit精度模式,仿真可重构存内计算协处理器指令的有效性,并设计可重构SRAM存内计算结构FPGA测试方案。仿真与实验结果表明,可重构存内计算阵列在1 bit、4 bit与16 bit精度下执行MAC指令的能耗效率分别为15.59 TOPS/W、8.91 TOPS/W与3.32 TOPS/W。采用可重构SRAM存内计算结构可以使得处理器执行单个3×3、5×5和11×11卷积计算的速度分别提升13.39%、27.39%和39.77%。 |
外文摘要: |
With the rapid development of artificial intelligence application technology, the gap between high computational density and insufficient memory bandwidth under the traditional von Neumann architecture is worsening. To mitigate this problem, large-scale processing systems are moving from computation-centric to data-centric models. In addition, reconfigurable array processor has gradually become a promising research direction to realize high-efficiency computing for new applications due to its flexibility and high computation efficiency. In view of the increasingly serious "memory wall" problem in reconfigurable array processors, the memory computing circuit can not only support the common reading and writing operations of the memory circuit, but also perform a variety of operations, which can effectively reduce the amount of data transfer and further reduce the energy consumption of the system. New memory and in-memory computing circuits have wide application prospects in energy-efficient artificial intelligence processors, Internet of Things terminal devices, smart home, smart city systems. So new memory and in-memory computing circuits deserve further research. First, according to the characteristics of the existing 8T SRAM unit with independent reading and writing bit lines, the reading and writing strategy of 8T SRAM unit are analyzed, the circuit design scheme of NOR logic structure, NAND logic structure and XOR logic structure are proposed. Through the Logical Effort of rational design of inverter and deflection gate, the reading bit line voltage changes when the SRAM structure double unit is turned on at the same time, and the LOGIC operation of NOR, NAND and XOR are realized. The experimental results show that the structure of 8T SRAM memory computing unit can not only perform conventional reading and writing operations, but also accurately implement NOR logic, NAND logic and XOR logic operations. Second, based on the 8T SRAM structure, the circuit is extended. A 4×4 structure 8T SRAM in-memory computing array is built, the peripheral control circuit is designed and implemented. The 8T SRAM array structure has the reading and writing operation function of traditional SRAM memory and some logical operation functions of computing unit. Aiming at the neural network algorithm, a multiplication scheme based on array multiplier is designed and implemented, which can accelerate the multiplication operation by replacing the AND gate in the multiplier as implemented in the array, thus improving the calculation efficiency. The experimental results show that the energy efficiency of NOR, NAND and XOR logic can reach 8.437 TOPS/W, 20.787 TOPS/W and 0.066 TOPS/W at 1 bit precision. The energy efficiency of the array can reach 2.897 TOPS/W, 6.989 TOPS/W and 0.019 TOPS/W respectively under 4-bit precision. Compared with other memory computing structures, this structure has great advantages in improving multi-precision computing. At the same time, the improvement of accuracy has a low impact on the delay, computing power and energy consumption efficiency. Then, in order to verify the applicability and effectiveness of the 8T SRAM in-memory computing array in the reconfigurable array structure, a reconfigurable SRAM in-memory computing structure is designed and implemented for simulation verification. This structure is composed of 8T SRAM in-memory computing array and reconfigurable in-memory computing co-processor, and the effective invocation of processor resources in the reconfigurable array is realized through custom instructions. The simulation results show that the MAC instruction designed for convolution operation can be operated in the memory, which reduces the data transmission between the processor and the memory. Finally, in order to verify the feasibility of the reconfigurable in-memory calculation structure, the 8T SRAM in-memory calculation array is expanded to 256 bits, the address is coded, and a 16-bit precision mode is added. For the 1 bit, 4 bit and 16 bit precision modes, the effectiveness of the reconfigurable in-memory calculation coprocessor instructions is simulated, and the FPGA test plan for the reconfigurable SRAM in-memory calculation structure is designed. The simulation and experimental results show that the energy consumption efficiency of the reconfigurable in-memory calculation array executing MAC instructions at 1 bit, 4 bit and 16 bit precision are 15.59 TOPS/W, 8.91 TOPS/W and 3.32 TOPS/W. The use of a reconfigurable SRAM in-memory calculation structure can increase the speed of the processor to perform a single 3 × 3, 5 × 5, and 11 × 11 convolution calculation by 13.39%, 27.39%, and 39.77%. |
参考文献: |
[1] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. [2] Goodfellow I, Bengio Y, Courville A. Deep learning[M]. London: The MIT Press, 2016: 519-542. [5] 魏少军, 刘雷波, 尹首一. 可重构计算处理器技术[J]. 中国科学: 信息科学, 2013, 42(12): 1559-1576. [7] 毛海宇, 舒继武, 李飞等. 内存计算研究进展[J].中国科学: 信息科学, 2021, 51(02): 173-205. |
中图分类号: | TN492 |
开放日期: | 2026-02-27 |