论文中文题名: | 图计算中遍历类应用的局部性挖掘与优化研究 |
姓名: | |
学号: | 19208049014 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 081202 |
学科名称: | 工学 - 计算机科学与技术(可授工学、理学学位) - 计算机软件与理论 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2022 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 图计算 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2022-06-28 |
论文答辩日期: | 2022-06-07 |
论文外文题名: | Research on Locality Mining and Optimization of Traversal Class Applications in Graph Computing |
论文中文关键词: | |
论文外文关键词: | Graph Processing ; Traversal Algorithm ; Locality ; Depth-Branch-Reorder ; Doubly Compressed Sparse Row |
论文中文摘要: |
大数据背景下,物理世界中图数据规模呈爆炸式增长,社交网络分析、数据挖掘、生物网络和推荐系统等应用紧密依赖于图计算系统。然而,图数据的非结构化和不规则性使得图计算在面临遍历、查找等实际问题时,造成内存访问随机性强、局部性差。而图计算操作简单、实时性强的高“访存计算比”不仅带宽需求高,且带宽浪费严重。为解决这一问题,提高图计算系统性能,论文基于综合权衡<图应用,图数据>性能参数评价模型,提出图数据重排序算法和一种新型的深度分支重排序双压缩稀疏行数据格式,缓解局部性差带来的挑战。 首先,为揭示不同图计算框架的差异性,构建综合权衡<图应用,图数据>性能参数评价模型,形成图计算框架分析的充分判据。选取工作负载和代表性数据集,对Ligra、Gemini和GraphBIG图计算框架进行深入分析与研究。围绕影响图计算系统性能的因素,包括图数据结构信息、图计算应用的具体实现、图计算系统的性能指标等,使用perf面向各类真实图进行性能实测与特性分析,进而构建基于图计算框架局部性的性能评价模型。实验结果表明,Gemini框架在执行时间和计算量上性能最佳,Ligra在数据移动量上性能最好,GraphBIG框架的性能最差,尤其在缓存缺失率和执行时间上最高。 其次,针对图应用的实现过程中存在数据访问随机性强、局部性差问题,提出一种基于局部性特征的深度分支重排序算法。根据构建的性能评价模型对遍历类应用进行层次社区和深度社区局部性挖掘,发现数据潜在局部性的问题。基于此,提出一种深度分支重排序算法,对算法性能与挖掘结果进行对比验证和分析。实验结果表明,重排序算法以数据集Amazon作为输入数据时,排序过程中消耗时间最短,性能最优。DBR算法时间开销比Gorder算法降低36.6%,比SeqDFS算法可降低10.6%。在缓存命中率上DBR算法较两者显著提升,以Wiki数据集作为实现BFS应用的输入数据,DBR算法的L1级缓存缺失率比Gorder算法优化18.6%,比SeqDFS算法优化13.5%。 然后,针对遍历类应用的性能需求,选择合适的数据组织格式来提升图计算系统性能问题,设计一种新型图数据组织格式DBR_DCSR。研究遍历类应用与不同数据组织格式之间的关系,组织格式包括COO、CSC、CSR、DCSC、CSCI。根据不同组织格式的内存占用以及性能测试结果,设计出一种新型图数据组织格式DBR_DCSR,并对其性能进行对比验证和分析。实验结果表明,DCSC压缩格式有效提高内存存储效率,CSR压缩格式有效减少程序执行时间、数据移动量以及降低能耗,DBR_DCSR压缩格式相较内存占用最低的DCSC压缩格式内存占用率下降8.2%~11.9%。 最后,为保证局部性优化方案的高效性,选择GraphBIG图计算框架进行测试与验证。基于高性能服务器搭建测试环境,设计总体验证方案。与当前典型图计算框架进行数据移动量、计算量、执行时间和缓存缺失率等指标的性能对比分析。实验结果表明,遍历类应用在GraphBIG上实现DBR_DCSR压缩算法局部性性能最优。以CA数据集作为实现BFS应用的输入数据,在执行时间上,GraphBIG_DBR_DCSR与DBG算法相比减少17.6%,与LBR算法相比减少8.7%。较Ligra与Gemini框架的原始数据分别减少了56.4%、11.1%。在数据移动量上,GraphBIG_DBR_DCSR较Ligra与Gemini框架的原始数据分别减少了7.4%、28.6%。在缓存缺失率上,GraphBIG_DBR_DCSR较Ligra与Gemini框架的原始数据分别减少了19.7%、15.6%。 |
论文外文摘要: |
In the context of big data, the scale of graph data in the physical world has exploded, and applications such as social network analysis, data mining, biological networks, and recommendation systems are closely dependent on graph computing systems. However, due to the unstructured and irregular nature of graph data, when graph computing faces practical problems such as traversal and search, it results in strong randomness and poor locality of memory access. However, the high "memory access calculation ratio" with simple operation and strong real-time performance of graph computing not only requires high bandwidth, but also wastes serious bandwidth. In order to solve this problem and improve the performance of the graph computing system, this paper proposes a graph data reordering algorithm and a novel deep branch reordering double-compressed sparse row data format based on a comprehensive trade-off performance parameter evaluation model. Alleviate the challenges posed by poor locality. Firstly, in order to reveal the differences between different graph computing frameworks, a comprehensive trade-off performance parameter evaluation model is constructed to form a sufficient criterion for graph computing framework analysis. Select workloads and representative datasets to conduct in-depth analysis and research on Ligra, Gemini, and GraphBIG graph computing frameworks. Focusing on the factors that affect the performance of the graph computing system, including graph data structure information, the specific implementation of graph computing applications, and the performance indicators of the graph computing system, perf is used to conduct performance measurement and characteristic analysis for various real graphs, and then build a graph-based computing framework. Locality performance evaluation model. The experimental results show that the Gemini framework has the best performance in terms of execution time and computation volume, Ligra has the best performance in terms of data movement, and the GraphBIG framework has the worst performance, especially in terms of cache miss rate and execution time. Secondly, to solve the problems of strong randomness and poor locality of data access in the implementation of graph applications, a deep branch reordering algorithm based on locality features is proposed. According to the constructed performance evaluation model, hierarchical community and deep community locality mining is carried out for traversal applications, and the potential locality of data is discovered. Based on this, a deep branch reordering algorithm is proposed to compare and analyze the algorithm performance and mining results. The experimental results show that when the reordering algorithm takes the dataset Amazon as the input data, the sorting process consumes the shortest time and has the best performance. The time overhead of the DBR algorithm is 36.6% lower than that of the Gorder algorithm, and 10.6% lower than that of the SeqDFS algorithm. In terms of cache hit rate, the DBR algorithm is significantly higher than the two. Using the Wiki dataset as the input data for BFS applications, the L1 cache miss rate of the DBR algorithm is 18.6% better than that of the Gorder algorithm and 13.5% better than that of the SeqDFS algorithm. Thirdly, according to the performance requirements of traversal applications, an appropriate data organization format is selected to improve the performance of graph computing systems, and a new graph data organization format DBR_DCSR is designed. Research the relationship between traversal applications and different data organization formats, including COO, CSC, CSR, DCSC, CSCI. According to the memory usage and performance test results of different organization formats, a new graph data organization format DBR_DCSR is designed, and its performance is compared, verified and analyzed. The experimental results show that the DCSC compression format effectively improves the memory storage efficiency, the CSR compression format effectively reduces program execution time, data movement and energy consumption, and the DBR_DCSR compression format reduces the memory usage by 8.2%~11.9% compared to the DCSC compression format with the lowest memory usage. Finally, in order to ensure the efficiency of the local optimization scheme, the GraphBIG graph computing framework is selected for testing and verification. Build a test environment based on a high-performance server and design an overall verification scheme. Compare and analyze the performance of data movement, computation, execution time and cache miss rate with the current typical graph computing framework. The experimental results show that the locality performance of the DBR_DCSR compression algorithm achieved by the traversal class application on GraphBIG is the best. Taking the CA dataset as the input data to implement the BFS application, the execution time of GraphBIG_DBR_DCSR is reduced by 17.6% compared with the DBG algorithm and 8.7% compared with the LBR algorithm. Compared with the original data of Ligra and Gemini frameworks, it is reduced by 56.4% and 11.1% respectively. In terms of data movement, GraphBIG_DBR_DCSR is 7.4% and 28.6% less than the original data of Ligra and Gemini frameworks, respectively. In terms of cache miss rate, GraphBIG_DBR_DCSR is 19.7% and 15.6% lower than the original data of Ligra and Gemini frameworks. |
参考文献: | ﹀ |
中图分类号: | TP399 |
开放日期: | 2022-06-29 |