- 无标题文档
查看论文信息

论文中文题名:

 面向通信基带信号处理的可重构阵列处理器研究与设计    

姓名:

 刘帅    

学号:

 19207205059    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085208    

学科名称:

 工学 - 工程 - 电子与通信工程    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2022    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 电子与通信工程    

研究方向:

 集成电路设计    

第一导师姓名:

 蒋林    

第一导师单位:

 西安科技大学    

论文提交日期:

 2022-06-29    

论文答辩日期:

 2022-06-08    

论文外文题名:

 Research and Design of Reconfigurable Array Processor for Communication Baseband Signal Processing    

论文中文关键词:

 可重构结构 ; 阵列处理器 ; 通信基带算法 ; 计算粒度 ; 并行化    

论文外文关键词:

 Reconfigurable architecture ; Array processor ; Communication baseband algorithm ; Calculation granularity ; Parallelization    

论文中文摘要:

可重构结构具有灵活的信息配置能力,在处理计算密集型和访存密集型应用时拥有巨大潜力。移动通信技术中新兴应用的出现对通信基带信号处理的硬件性能提出了更高的要求,在并行计算领域占有优势的可重构架构成为实现基带信号处理算法的理想硬件平台。然而,在可重构阵列处理器上实现基带信号处理算法时存在适应性差和计算效率低的问题,因此论文研究并设计了面向基带信号处理的可重构阵列处理器。

首先,提取通信基带信号处理典型算法的算子,并评估算法的定点精度,以指导可重构阵列处理器的设计。一方面,通过Profile性能分析工具获取快速傅里叶变换(Fast Fourier Transform, FFT)、有限冲激响应(Finite Impulse Response, FIR)和大规模多输入多输出(Multiple-Input Multiple-Output, MIMO)检测算法的特性,提取了抽象的粗粒度算子。另一方面,通过对算法进行定点仿真的实验结果说明,当硬件结构具有15位以上的数据位宽时,定点精度曲线能够收敛。

其次,针对基带信号处理算法在可重构阵列处理器上适应性差的问题,设计面向通信应用的可重构处理单元。该处理单元(Process Element, PE)将16位的数据位宽扩展为32位,以适配复数操作。同时,在PE中增加了基带信号处理专用指令。通过可重构处理单元执行复数矩阵乘法的实验结果表明,专用指令的实现方法比通用指令缩短了74%的代码行数,减少了61%的存储访问次数,且平均相对误差降低了85%。

然后,针对不同粒度数据与底层硬件结构不协调导致计算效率低的问题,提出一种计算粒度动态配置结构。该结构将计算粒度分为8位、16位和32位,设计了数据组合、数据拆分、并行加法和并行乘法四种功能,使阵列结构的并行度和灵活性得到提高。实验结果表明,计算粒度动态配置电路的最大工作频率为133.5MHz,能够实现计算中不同粒度数据的动态配置。

最后,开发面向通信基带信号处理的可重构阵列原型系统,设计了FFT算法、FIR算法和大规模MIMO检测算法的可重构实现方案,并完成现场可编程门阵列(Field Programmable Gate Array, FPGA)验证。可重构实现结果表明,蝶形运算模块并行化方案为8点FFT算法提供了2.90倍的加速比,滤波计算的流水线并行方案为8阶FIR滤波算法提供了7.28倍的加速比,Gram矩阵计算并行化方案为大规模MIMO检测算法最大提供了5.57倍的加速比。基于ZC706开发板的硬件实验结果表明,可重构阵列处理器在112MHz的工作频率下资源占用率低于60%,实现了不同算法在阵列结构上的灵活配置和并行加速。

论文外文摘要:

Reconfigurable architecture has great potential in computing intensive and memory intensive applications due to its flexible information configuration. The emergence of new applications in mobile communication technology has put forward higher requirements on the hardware performance of communication baseband signal processing. The reconfigurable architecture, which has advantages in the field of parallel computing, has become an ideal hardware platform to implement baseband signal processing algorithm. However, the implementation of baseband signal processing algorithm on reconfigurable array processor has problems of poor adaptability and low computational efficiency. Therefore, this thesis studies and designs a reconfigurable array processor for baseband signal processing.

Firstly, the operators of typical algorithms for communication baseband signal processing are extracted, and the fixed-point accuracy of the algorithms is evaluated to guide the design of reconfigurable array processor. On the one hand, the characteristics of Fast Fourier Transform (FFT) , Finite Impulse Response (FIR) and massive Multiple-Input Multiple-Output (MIMO) detection algorithms are obtained by Profile performance analysis tool, and abstract coarse-grained operators are extracted. On the other hand, the experimental results of fixed-point simulation show that the fixed-point precision curve can converge when the hardware structure has more than 15 bits data width.

Secondly, aiming at the poor adaptability of baseband signal processing algorithm on reconfigurable array processor, a reconfigurable process element for communication applications is designed. The Process Element (PE) expands the 16-bit data width to 32-bit to accommodate complex operations. At the same time, special instructions for baseband signal processing are added to the PE. The experimental results of complex matrix multiplication performed by the reconfigurable process element show that the implementation method of the special instruction shortens 74% of the code lines, reduces the number of memory access by 61%, and reduces the average relative error by 85% compared with the general instruction.

Thirdly, aiming at the problem of low computing efficiency caused by disharmony between the data of different granularity and the underlying hardware structure, a structure of computational granularity dynamic configuration is proposed. The structure divides the computing granularity into 8-bit, 16-bit and 32-bit, and designs four functions of data combination, data splitting, parallel addition and parallel multiplication, which improves the parallelism and flexibility of the array structure. The experimental results show that the maximum working frequency of the dynamic configuration circuit is 133.5MHz, which can realize the dynamic configuration of different granularity data in the calculation.

Finally, a reconfigurable array prototype system for communication baseband signal processing is developed, and a reconfigurable implementation scheme for FFT algorithm, FIR algorithm and massive MIMO detection algorithm is designed, and Field Programmable Gate Array (FPGA) verification is completed. The reconfigurable implementation results show that the parallelization scheme of butterfly operation module provides 2.90 times speedup for 8-point FFT algorithm, the pipeline-parallel scheme of filtering calculation provides 7.28 times speedup for 8-order FIR filter algorithm, and Gram matrix computation parallelization scheme provides a maximum speedup of 5.57 times for massive MIMO detection algorithm. The hardware experiment results based on ZC706 development board show that the resource utilization rate of the reconfigurable array processor is less than 60% at the frequency of 112MHz, which achieves flexible configuration and parallel acceleration of different algorithms on the array structure.

参考文献:

[1]Chen S, Sun S, Kang S. System Integration of Terrestrial Mobile Communication and Satellite Communication-the Trends, Challenges and Key Technologies in B5G and 6G[J]. China Communications, 2020, 17(12):156-171.

[2]Khasanov R, Robledo J, Menard C, et al. Domain-specific Hybrid Mapping for Energy-efficient Baseband Processing in Wireless Networks[J]. ACM Transactions on Embedded Computing Systems (TECS), 2021, 20(5):1-26.

[3]魏少军,李兆石,朱建峰,刘雷波.可重构计算:软件可定义的计算引擎[J].中国科学:信息科学,2020,50(09):1407-1426.

[4]Liu L, Li Z, Chen Y, et al. HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing[J]. IEEE Transactions on Circuits & Systems II Express Briefs, 2018, 65(3):381-385.

[5]L. Chettri and R. Bera. A Comprehensive Survey on Internet of Things (IoT) Toward 5G Wireless Systems[J]. IEEE Internet of Things Journal, 2020, 7(1):16-32.

[6]H. Tataria, M. Shafi, A. F. Molisch, et al. 6G Wireless Systems: Vision, Requirements, Challenges, Insights, and Opportunities[J]. Proceedings of the IEEE, 2021, 109(7):1166-1199.

[7]赵亚军, 郁光辉, 徐汉青. 6G 移动通信网络: 愿景、挑战与关键技术[J]. 中国科学: 信息科学, 2019, 49: 963–987.

[8]Subramaniyam D. VLSI Implementation of Variable Bit Rate OFDM Transceiver System with Multi-radix FFT/IFFT Processor for Wireless Applications[J]. Journal of Electrical Engineering, 2018, 3(1):1-10.

[9]Zhou Z, Liu L, Chang H H. Learning for Detection: MIMO-OFDM Symbol Detection Through Downlink Pilots[J]. IEEE Transactions on Wireless Communications, 2020, 19(6):3712-3726.

[10]Ali L, Farshad. Analog Hardware Trojan Design and Detection in OFDM Based Wireless Cryptographic ICs[J]. PLoS One, 2021, 16(7):1-25.

[11]Bangash K, Khan I, Lloret J, et al. A Joint Approach for Low-Complexity Channel Estimation in 5G Massive MIMO Systems[J]. Electronics, 2018, 7(10):1-14.

[12]M. Guo, M. C. Gursoy. Performance Analysis of Cell-Free Massive MIMO Systems with Massive Connectivity[C]//2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC). IEEE, 2021: 1-6.

[13]Ngo H Q, Ashikhmin A, Yang H, et al. Cell-Free Massive MIMO versus Small Cells[J]. IEEE Transactions on Wireless Communications, 2016, 16(3): 1834-1850.

[14]Nayebi E, Rao B D. Semi-blind Channel Estimation for Multiuser Massive MIMO Systems[J]. IEEE Transactions on Signal Processing, 2018, 66(2):540-553.

[15]Kaltenberger F, Silva A P, Gosain A, et al. OpenAirInterface: Democratizing Innovation in the 5G Era[J]. Computer Networks, 2020, 176:1-35.

[16]J Sun, Xu G, Ren W, et al. Radar emitter classification based on unidimensional convolutional neural network[J]. IET Radar, Sonar & Navigation, 2018, 12(8):862-867.

[17]C. Tarver, M. Tonnemacher, H. Chen, et al. GPU-Based, LDPC Decoding for 5G and Beyond[J]. IEEE Open Journal of Circuits and Systems, 2021, 2: 278-290.

[18]Li K, Sharan R R, Chen Y, et al. Decentralized Baseband Processing for Massive MU-MIMO Systems[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2017, 7(4):491-507.

[19]W. Tang, S. Yang and X. Li. Implementation of Space-time Coding and Decoding Algorithms for MIMO Communication System Based on DSP and FPGA[C]//2019 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). IEEE, 2019: 1-5.

[20]Tang W, Prabhu H, Liu L, et al. A 1.8Gb/s 70.6pJ/b 128×16 link-adaptive near-optimal massive MIMO detector in 28nm UTBB-FDSOI[C]//2018 IEEE International Solid-State Circuits Conference(ISSCC). IEEE, 2018:224-226.

[21]戴庆达,叶茂.基于FPGA的高精度时间数字转换电路设计[J].北京邮电大学学报,2020,43(04):88-94.

[22]Peng G, Liu L, Peng Z, et al. Low-Computing-Load, High-Parallelism Detection Method Based on Chebyshev Iteration for Massive MIMO Systems With VLSI Architecture[J]. IEEE Transactions on Signal Processing, 2017, 65(14):3775-3788.

[23]Lu Y, Liu L, Zhu J, et al. Architecture, challenges and applications of dynamic reconfigurable computing[J]. Journal of Semiconductors, 2020, 41(2): 4-13.

[24]Taras I, Anderson J H. Impact of FPGA Architecture on Area and Performance of CGRA Overlays[C]//2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2019: 87-95.

[25]Waqar Hussain, Roberto Airoldi, Henry Hoffmann, et al. HARP2: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms[J]. Journal of Signal Processing Systems, 2016:341-353.

[26]Nouri S, Hussain W, Nurmi J. Evaluation of a Heterogeneous Multicore Architecture by Design and Test of an OFDM Receiver[J]. IEEE Transactions on Parallel & Distributed Systems, 2017, 28(11):3171-3187.

[27]Peng G, Liu L, Zhou S, et al. A 2.92-Gb/s/W and 0.43-Gb/s/MG Flexible and Scalable CGRA-Based Baseband Processor for Massive MIMO Detection[J]. IEEE Journal of Solid-State Circuits, 2020, 55(2):505-519.

[28]Yun Z, Jiang L, Wang S, et al. Design of reconfigurable array processor for multimedia application[J]. Multimedia Tools and Applications, 2018, 77(3): 3639-3657.

[29]山蕊,李涛,蒋林,邓军勇,沈绪榜.Design and Implementation of a Data-Driven Dynamical Reconfigurable Cell Array[J].Journal of Shanghai Jiaotong University(Science),2017,22(04):493-503.

[30]Sharma H, Park J, Suda N, et al. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks[C]// 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018:764-775.

[31]Duncan J.M Moss, Srivatsan Krishnan, Eriko Nurvitadhi, et al. A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study[C]//2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2018:107-116.

[32]Faraone J, Kumm M, Hardieck M, et al. AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020, 28(1):115-128.

[33]Chen Y H, Krishna T, Emer J S, et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks[J]. IEEE Journal of Solid-State Circuits, 2017, 52(1):127-138.

[34]Judd P, Albericio J, Moshovos A. Stripes: Bit-Serial Deep Neural Network Computing[J]. IEEE Computer Architecture Letters, 2017, 16(1):80-83.

[35]马丽萍, 张骁煜, 白雨鑫, 陈鑫, 张颖. 基于近似计算的精度动态可调FFT处理器[J].上海交通大学学报, 2022, 56(02):223-230.

[36]Banerjee A, Dhar A S. A Novel Paradigm of CORDIC-Based FFT Architecture Framed on the Optimality of High-Radix Computation[J]. Circuits Systems and Signal Processing, 2021, 40:311-334.

[37]唐川. 大规模MIMO系统信号检测技术算法研究及硬件加速[D].长沙:国防科学技术大学,2017.

[38]Liu L, Peng G, Wang P, et al. Energy and Area Efficient Recursive Conjugate Gradient Based MMSE Detector for Massive MIMO Systems[J]. IEEE Transactions on Signal Processing, 2020, 68: 573-588.

[39]Indrajeet Kumar, Vikash Sachan, Ravi Shankar, et al. Performance Analysis of Multi-User Massive MIMO Systems with Perfect and Imperfect CSI[J]. Procedia Computer Science, 2020, 167:1452-1461.

[40]Shan Rui, Jiang Lin, Wu haoyue, et al. Dynamical Self-Reconfigurable Mechanism for Data-Driven Cell Array[J]. Journal of Shanghai Jiaotong University (Science), 2021, 26(4): 511-521.

[41]Umuroglu Y, Conficconi D, Rasnayake L, et al. Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing[J]. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2019, 12(3): 1-24.

[42]O Castañeda, Goldstein T, Studer C. Data Detection in Large Multi-Antenna Wireless Systems via Approximate Semidefinite Relaxation[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2016, 63(12):2334-2346.

[43]谭颖然.大规模MIMO检测可重构计算芯片架构关键技术研究[D].北京:清华大学,2018.

[44]Y. A. Stepchenkov, D. V. Khilko, Y. I. Shikunov, et al. DSP Filter Kernels Preliminary Benchmarking for Recurrent Data-flow Architecture[C]// 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus). IEEE, 2021:2040-2044.

[45]Maki A, Miyashita D, Nakata K, et al. FPGA-based CNN Processor with Filter-Wise-Optimized Bit Precision[C]//2018 IEEE Asian Solid-State Circuits Conference (A-SSCC). IEEE, 2018:47-50.

[46]Y. Chen, H. Du and L. Chang. A Reconfigurable micro-Processing Element for Mixed Precision CNNs[C]// 2022 14th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA). IEEE, 2022:1-5.

[47]Liu W, Liao Q, Qiao F, et al. Approximate Designs for Fast Fourier Transform (FFT) With Application to Speech Recognition[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2019, 66(12):4727-4739.

[48]蒋林,贺飞龙,山蕊,王帅,吴皓月,武鑫.可重构视频阵列处理器测试平台设计与实现[J].系统仿真学报,2020,32(05):792-800.

[49]D Wolf, Engel A, Ruschke T, et al. UltraSynth: Insights of a CGRA Integration into a Control Engineering Environment[J]. Journal of Signal Processing Systems, 2021,93:463-479.

[50]Zixuan Wang. Massive MIMO Detection Algorithms Based on MMSE-SIC, ZF-MIC, Neumann Series Expansion, Gauss-Seidel and Jacobi Method[J]. Journal of Physics: Conference Series, 2020, 1438(1):1-7.

[51]Abdelhamid R B, Yamaguchi Y, Boku T. MITRACA: A Next-Gen Heterogeneous Architecture[C]// 2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). IEEE, 2019:304-311.

[52]Hussein F, Daoud L, Rafla N. A Reconfigurable HexCell-Based Systolic Array Architecture for Evolvable Hardware on FPGA[J]. Microprocessors and Microsystems, 2020, 74: 1-10.

中图分类号:

 TN492    

开放日期:

 2022-06-29    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式