论文中文题名: | 基于结构特征的大规模图划分算法研究 |
姓名: | |
学号: | 201508386 |
学科代码: | 085211 |
学科名称: | 计算机技术 |
学生类型: | 硕士 |
学位年度: | 2018 |
院系: | |
专业: | |
研究方向: | 大数据 |
第一导师姓名: | |
第一导师单位: | |
论文外文题名: | Research on large-scale graphs partitioning algorithm based on structural features |
论文中文关键词: | |
论文外文关键词: | Large-scale Graphs ; Partitioning of Graphs ; Structural Features ; the Number of Crossed Edges ; load balancing |
论文中文摘要: |
对大规模图进行良好划分是实现对其分布式处理的重要基础之一。研究发现已有的图划分方法难以对大规模图实现较优的划分,即便是能够划分大规模图的方法,但由于忽略了图的结构性对划分效果的影响,导致难以对全局实现良好的划分。因此,以降低通信开销和实现负载均衡为目标,研究基于结构特征的大规模图划分算法,对分布式处理大规模图有重要的理论价值和现实意义。
首先,研究大规模图结构特征对划分效果的影响。提出通过顶点度的分布特征来描述图结构特征的方法,并基于真实图数据产生若干顶点数和边数相同但结构特征不同的仿真数据集,通过相似度计算,证明该方法对描述真实大规模图结构特征的有效性。通过实现Hash和点对交换划分算法,实验验证结构特征与划分效果之间的关系。实验结果表明,点对交换算法能够有效降低通信代价且图的顶点度分布差异越大,划分后交叉边数量越少,划分质量越好,由此可证,图的结构特征影响划分效果。
其次,研究如何提取出大规模图数据中的结构特征。包括定义大规模图的结构特征以及提取原则,设计基于顶点度优先策略的图结构特征提取算法,通过参数调优对多个顶点和边数相同但结构特征不同的大规模图进行提取实验,实验结果表明该算法能够有效的提取图结构特征,为下一步实现基于结构特征对大规模图进行划分奠定基础。
最后,研究如何基于图的结构特征对大规模图进行划分。通过实验分析比对Hash划分算法、点对交换算法和遗传算法,从中选取划分效果较优的遗传算法。为达到对有权图进行较优的划分,对其适应度函数进行改进,保证负载均衡的划分结构特征中的关键顶点。在关键顶点得到良好划分基础之上,给出了非关键顶点划分方法,依次完成对整个大规模图的划分。通过对多个真实图数据进行划分实验,结果表明,基于结构特征的大规模图划分算法能够实现负载均衡,其时间性能优于点对交换算法,且交叉边个数少于Hash划分算法,由此可得,该算法对于大规模图的划分是可行且较优的。
﹀
|
论文外文摘要: |
The good partition of large-scale graphs was an important foundation, which was used for achieving its distributed processing. Some studies indicated that the existing graphs partitioning method was difficult to achieve the good partitioning for large-scale graphs, even the method could divide the large-scale graphs. This was because that the method ignored the structural features of graphs. Thus, the existing graphs partitioning method was difficult to achieve a good partition of the whole system. Therefore, to study the structure of large-scale graphs which were aimed to reduce the communication overload and achieve the load balance, and to study the algorithms for the efficient partition of large-scale graphs had an important theoretical value and practical significance for the partition of large-scale graphs.
In the first place, the structural features of large-scale graphs which had influenced on the effect of partition of large-scale graphs were studied, and a method that described the graphs’ structural features by using the distribution features of vertex degree was proposed in this study. Based on the real graph data, several simulation datasets with the same numbers of vertices and edges but different structural features were generated. The validity of the method for describing the structural features of the real large-scale graphs was proved by the calculation of the similarity between the real graphs and the simulation graphs. Secondly, the nexus with the structural features of graphs and the quality of partitioning was verified by the Hash algorithm and point-to-point exchange algorithms. The experiment results showed that the point-to-point exchange algorithms could effectively reduce the number of crossed edges. Additionally, the greater difference in the vertex distribution of graphs caused the small number of crossed edges and good quality of partition. Thus, the effect of the partition was influenced by the structural features of graphs.
In the next, this study researched that how to excavate the structural features of the data of the large-scale graphs. Firstly, the structural features of the large-scale graphs were defined. The graphs structural features mining algorithm which was based on the priority strategy of vertex degree was designed. Then, the mining experiments of multiple large-scale graphs were performed through the parameter tuning. The experiment results showed that this algorithm could effectively excavate the structural features of graphs. Furthermore, this algorithm laid a foundation for the next step in the realization of the large-scale graphs partition, which was based on structural features.
Finally, this study studied that how to divide the large-scale graphs was based on their structural features. To compare the Hash partitioning algorithm, the point-to-point exchange algorithm and the genetic algorithm through the experimental analysis, the genetic algorithm was selected. This was because that the effectiveness of genetic algorithm was better than other two algorithms. In order to achieve a better partition of power graphs, its fitness function was improved to ensure that the key vertices were evenly divided in the load balancing structural features. Based on the fact that the key vertices were well partitioned, a non-critical vertex partitioning method was proposed to complete the division of the entire graph. Overall, the time performance of the proposed large-scale graph partitioning algorithm which was based on the structural features was better than point-to-point exchange algorithm. The quality of number of crossed edges partitioning algorithm was better than the Hash partitioning algorithm. Thus, the results were proved that this algorithm was feasible and superior to the partition of large-scale graphs.
﹀
|
中图分类号: | TP301.6 |
开放日期: | 2018-06-09 |