题名: | 面向导航定位的SIREA芯片 测试基准与测试平台开发 |
作者: | |
学号: | 21207223073 |
保密级别: | 保密(4年后开放) |
语种: | chi |
学科代码: | 085400 |
学科: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位: | 工程硕士 |
学位年度: | 2024 |
学校: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 专用集成电路设计 |
导师姓名: | |
导师单位: | |
提交日期: | 2024-06-24 |
答辩日期: | 2024-06-06 |
外文题名: | SIREA Chip Test Benchmark and Platform Development for Navigation and Positioning |
关键词: | |
外文关键词: | Self-reconfigurable ; Navigation Positioning ; Artificial Intelligence Chip ; Test Method ; Test Benchmark ; Test Platform |
摘要: |
近年来,基于激光雷达的同时定位与地图构建(Simultaneous Localization and Mapping,SLAM)算法在解决高精度、实时定位、导航和地图构建问题中发挥着重要作用。随着算法结构的不断改进与迭代,SLAM系统开始变得越来越复杂,通用处理器己经无法很好地满足这类应用的计算需求。同时,芯片计算架构开始朝着定制化方向演进,进而出现了一系列自动驾驶人工智能(Artificial Intelligence,AI)芯片。在AI处理器的设计开发过程中,标准的基准测试程序和测试指标至关重要。本文提出了一套面向导航定位场景的基准测试程序集,用于对当前导航定位硬件进行客观的评估,判断处理器设计的合理性以及对比不同处理器的设计优劣,指导软硬件层面的系统优化,帮助研究人员设计出更高效的自动驾驶AI芯片。 首先,针对本文测试对象,即依托国家重大项目研制的自重构自演化AI芯片(SIREA[1]),选取多任务切换时间、工作频率、全阵列功耗、芯片算力等核心指标作为测试重点,并基于芯片样片设计目标进行原型系统的功能拟合与性能数据折算。同时,研究上述指标的测试方法,明确其测试流程。研究内建自测试技术,通过16阶线性反馈移位寄存器(Linear Feedback Shift Register,LFSR)生成65535组测试向量用于内建电路自测试。研究高级可扩展接口(Advanced Extensible Interface,AXI)总线测试方法。实验结果证明,配置信息可通过AXI总线正确下发,计算结果也可借助AXI总线写回。 其次,通过Profile性能分析工具获取基准算法的模块化特性和算子级特征,进而提出导航定位基准测试程序集,即MacroBenchmark和MicroBenchmark。前者包含点云匹配、地图构建,回环检测、图优化等4种不同的SLAM算法模块,后者包含矩阵转置、矩阵乘法、矩阵分解、矩阵求逆等4个典型的矩阵运算类型。同时,研究浮点数据整型转换方法,通过平衡资源占用与计算误差,最终将操作数位宽量化为16bit,小数位宽定点为8bit。研究量化误差精度补偿方法,采用三级残差补偿策略可将轨迹累计误差保持在10-5量级,符合导航场景精度需求。 然后,提出以“RK3588+VU440”为核心元件的原型系统测试平台总体方案,设计基于外设组件快速互联总线(Peripheral Component Interconnect express,PCIe)接口的数据通信模块。采用固定优先级的仲裁策略设计多通道双倍速率存储(Double Data Rate Memory,DDR)控制器,使得数据有序读写,仲裁电路的工作频率为247.7MHz,占用的查找表(Look-Up Table,LUT)资源为181,触发器(Flip Flop,FF)资源为156,最小延时为4.037ns。基于遍历所有处理元(Processor Element,PE)的累加程序对测试平台进行整体功能测试,实测结果符合片上PE集成规模。 最后,基于导航定位程序集开展SIREA芯片原型测试。设计程序集的自重构PE阵列映射方案,引入多任务切换向更复杂的算法扩展,从平均相对误差、存储访问次数等关键性能评价指标来验证算法执行情况。基于VU440开发板的硬件实验结果表明,该映射方案可在高速并行计算的同时,仅引入0.1%左右的计算误差。所提基准测试程序集高度适配SIREA芯片,能较好的解决算法兼容性和计算精度的问题。同时,多任务切换时间实测为35Cycle,全阵列功耗为8.06W,工作频率为133.5MHz,芯片算力为36.85GOPS,各项指标测试均符合原型系统预期结果。 |
外文摘要: |
In recent years, the Simultaneous Localization and Mapping (SLAM) algorithm has played an important role in solving high-precision, real-time localization, navigation and map construction. With the continuous improvement and iteration of the algorithm structure, SLAM system begins to become more and more complex, and the general processor cannot meet the needs of this kind of application. At the same time, the architecture began to evolve in the direction of customization, and a series of autonomous driving Artificial Intelligence (AI) chips emerged. In the design and development of AI processors, standard benchmark procedures and test indicators are very important. This paper proposes a set of benchmark test assembly, which is used to evaluate the current navigation and positioning hardware, judge the rationality of processor design and compare the advantages and disadvantages of different processor designs, guide the system optimization at the hardware and software level, and help researchers design more efficient autonomous driving AI chips. Firstly, for the test object of this paper, namely the self-reconfigurable and self-evolving AI chip (SIREA) developed by relying on national major projects, core indicators such as multi-task switching time, operating frequency, full array power consumption, and chip computing power are selected as the test focus, and functional fitting and performance data conversion of the prototype system are carried out based on the chip sample design objectives. Study the testing method of the above indicators and clarify its testing process. Research on Built-In Self-Test technology, generating 65535 sets of test vectors for built-in circuit self-test using a 16th order Linear Feedback Shift Register (LFSR).The test method of Advanced Extensible Interface (AXI) bus is studied. The experimental results show that the configuration information can be correctly sent through AXI bus, and the calculation results can also be written back through AXI bus. Secondly, the modularity and operator-level characteristics of the benchmark algorithm are obtained by Profile, and then the navigation and positioning benchmark assembly, namely MacroBenchmark and MicroBenchmark, are proposed. The former includes four different modules such as point cloud matching, map construction, loop detection, and graph optimization, while the latter includes four typical matrix operation types such as matrix transpose, matrix multiplication, matrix decomposition, and matrix inversion. The method of integer conversion of floating-point data is studied. By balancing resource occupation and calculation error, the operation digit width is quantized to 16bit, and the fractional part is calibrated to 8bit. The method of quantization error accuracy compensation is studied, and the three-level residual compensation strategy can keep the cumulative error of the trajectory at the order of 10-5, which meets the accuracy requirement of navigation scene. Then, the overall scheme of prototype system test platform with "RK3588+VU440" as the core component is proposed, and the data communication module based on PCIe interface is designed. The multi-channel DDR controller is designed with a fixed priority arbitration strategy to make data read and write orderly. The working frequency of the arbitration circuit is 247.7MHz, the Look-Up Table (LUT) resources are 181, the Flip Flop (FF) resources are 156, and the minimum delay is 4.037ns. The overall function of the test platform is tested based on an accumulator that traverses all Processor elements (PE), and the measured results are consistent with the PE integration scale on chip. Finally, the prototype test of SIREA chip is carried out based on the navigation and positioning. The self-reconfigurable PE array mapping scheme is designed, and the algorithm is extended to a more complex algorithm by multi-task switching. The performance of the algorithm is verified by key performance evaluation indexes such as average relative error and memory access times. The experimental results of hardware based on VU440 board show that the mapping scheme can be used in parallel computation at high speed and only bring about 0.1% error. The proposed benchmark assembly is highly suitable for SIREA chip and can solve the problem of algorithm compatibility and calculation accuracy. At the same time, the measured multi-task switching time is 35Cycle, the power consumption of the whole array is 8.06W, the operating frequency is 133.5MHz, and the chip computing power is 36.85GOPS. All indexes are in line with the expected results of the prototype system. |
参考文献: |
[3] 张蔚敏. 深度神经网络硬件基准测试现状及发展趋势[J]. 信息通信技术与政策, 2019, 45(12): 74-79. [4] 徐青青,安虹,武铮,金旭. 主流卷积神经网络的硬件设计与性能分析[J]. 计算机系统应用, 2020, 29(02): 49-57. [8] 魏少军,李兆石,朱建峰,等. 可重构计算:软件可定义的计算引擎[J]. 中国科学:信息科学, 2020, 50(09): 1407-1426. [14] 刘铭哲,徐光辉,唐堂,等. 激光雷达SLAM算法综述[J]. 计算机工程与应用, 2024, 60(01): 1-14. [17] 李枭凯,李广云,索世恒等. 激光SLAM技术进展[J]. 导航定位学报, 2023, 11(04): 8-17. [30] 翟季冬. AIPerf:大规模人工智能算力基准测试程序[J]. 大数据, 2021, 7(03): 153-155. [51] 马宝良,崔丽珍,李敏超,等. 露天煤矿环境下基于LiDAR/IMU的紧耦合SLAM算法研究[J]. 煤炭科学技术, 2024, 52(03): 236-244. [52] 常立博,张盛兵. 面向混合量化CNNs的可重构处理器设计[J]. 西北工业大学学报, 2022, 40(02): 344-351. [60] 吴丹,郭梦琪. 基于ZYNQ的编译码通用高斯测试平台设计[J]. 无线电工程, 2023, 53(02): 465-470. [63] 刘俊秀,黄星月,罗玉玲,等.脉冲神经网络硬件互连系统的动态优先级仲裁策略[J].电子学报,2018,46(08):1898-1905. |
中图分类号: | TN492 |
开放日期: | 2028-06-26 |