- 无标题文档
查看论文信息

论文中文题名:

 基于社区发现的动态图划分方法研究    

姓名:

 王佳    

学号:

 18208088019    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 083500    

学科名称:

 工学 - 软件工程    

学生类型:

 硕士    

学位级别:

 工学硕士    

学位年度:

 2021    

培养单位:

 西安科技大学    

院系:

 计算机科学与技术学院    

专业:

 软件工程    

研究方向:

 大数据与云计算    

第一导师姓名:

 罗晓霞    

第一导师单位:

 西安科技大学    

论文提交日期:

 2021-06-21    

论文答辩日期:

 2021-06-04    

论文外文题名:

 A dynamic graph partition method based on Community Detection algorithm    

论文中文关键词:

 动态图划分 ; 社区发现 ; 交叉边 ; 负载均衡 ; 模拟退火    

论文外文关键词:

 Dynamic graph partition ; Community detection ; crossed edge ; load balance ; Simulated annealing    

论文中文摘要:

随着网络的快速发展,以网络为载体的信息量迅速增长,使得图的规模也随之扩大。由于大规模图数据无法使用单机环境进行处理,因此分布式图计算应运而生。分布式图计算需要将图数据合理地分布到每个节点,对图进行合理划分是实现分布式图计算的前提。现实世界的图处于不断的动态演化之中,如何对动态图进行合理划分是当前的一个研究热点。针对动态图划分问题,本文的主要研究内容如下:

首先,研究新增图社区性对动态图划分质量的影响。将给定数量的动态图中更新的顶点看成一张新增图,采用不同的社区发现算法对其进行划分,得到模块度不同的子图。通过实验,将使用不同社区发现算法产生的划分结果进行对比,结果表明,新增图划分后的模块度越大,社区性越强,对动态图的划分质量越好。由此证明,新增图的社区性强弱影响动态图的划分质量,为下一步设计动态图划分方法奠定了基础。

其次,针对动态图划分问题,提出了一种基于社区发现的划分方法。其基本思想是收集给定数量的图更新操作,对于顶点的加入操作,利用社区发现算法对这些顶点进行预划分,将预划分产生的社区结果插入到已经划分好的当前图中,其余的顶点删除、边的加入/删除操作,分别根据收集的信息依次更新。在公开数据集上进行实验,从交叉边数和负载均衡度两方面将本方法与传统流式划分方法进行比较,结果表明,本方法的交叉边数降低了13%,负载均衡度减少了42.3%。由此可得,本方法的划分质量明显优于传统的流式划分方法,能够对动态图进行高效处理,降低子图之间的通信开销。

最后,为了对动态图的划分结果进一步优化,使划分后的子图内部联系紧密,子图之间耦合度低,又避免对整个动态图重新划分,提出了一种基于模拟退火的动态图顶点更新方法。传统顶点更新方法仅根据相邻节点判断顶点位置是否转移,会造成局部最优。本方法在传统顶点更新方法基础上加入了随机扰动的因素,通过对能量函数、更新概率、冷却计划的设计,避免顶点更新陷入局部最优的后果,使得各个子图之间的交叉边数最少,达到全局最优的效果。通过实验,将本方法与传统的顶点更新方法和基于社区发现的动态图划分方法进行比较,结果表明,本方法比传统顶点更新方法交叉边数降低了12.4%,负载均衡度减少了11.7%,比基于社区发现的动态图划分方法交叉边数降低了9.3%,负载均衡度减少了2.2%,由此可得,优化后的方法只通过部分顶点位置更新实现对动态图的划分,能够有效避免对整个动态图进行重新划分,提高了图的划分质量。

论文外文摘要:

With the rapid development of the network, the amount of information based on the network is growing rapidly, which makes the scale of the graph also expand. Because large-scale graph data can not be processed in a single machine environment, distributed graph computing emerges as the times require. Distributed graph computing needs to distribute the graph data evenly to each node, and the reasonable partition of graph is the premise of distributed graph computing. The real world graph is in the process of continuous dynamic evolution. How to partition the dynamic graph reasonably is a research hotspots. Aiming at the problem of dynamic graph partitioning, the main contents of this paper are as follows:

Firstly, we study the influence of the new graph community on the quality of dynamic graph partition. The updated vertex in the dynamic graph in a given period of time is regarded as a new graph, which is partitioned by different community discovery algorithms, and the subgraphs with different modularity are obtained. Through experiments, the partition results generated by different community discovery algorithms are compared. The results show that the greater the modularity, the stronger the community and the better the quality of dynamic graph partition. It is proved that the community of the new graph affects the quality of dynamic graph partition, which lays a foundation for the next step of designing dynamic graph partition method.

Secondly, aiming at the problem of dynamic graph partition, a partition method based on community discovery is proposed. The basic idea is to collect a given number of graph update operations. For the join operation of vertices, the community discovery algorithm is used to pre-partition these vertices, and the community results generated by the pre-partition are inserted into the current graph that has been divided. The remaining vertex deletion and edge join / delete operations are updated in turn according to the collected information. Experiments on open data sets show that the number of cross edges and load balance of this method are reduced by 13% and 42.3% compared with the traditional streaming partition method. It can be concluded that the partition quality of this method is obviously better than that of the traditional stream partition method, and it can efficiently process the dynamic graph and reduce the communication overhead between subgraphs.

Finally, in order to optimize the partition result of the dynamic graph, make the internal connection of the partitioned subgraphs close, the coupling degree between subgraphs is low, and avoid the re-partition of the whole dynamic graph, a dynamic graph vertex update method based on simulated annealing is proposed. Traditional vertex updating methods only judge whether the vertex position is transferred according to the adjacent nodes, which will result in local optimization. Based on the traditional vertex updating method, this method adds the factor of random disturbance. Through the design of energy function, update probability and cooling plan, it can avoid the result of vertex updating falling into local optimum, and make the connection edges between subgraphs minimum, so as to achieve the global optimum effect. Through experiments, this method is compared with the traditional vertex update method and the dynamic graph partition method based on community discovery. The results show that compared with the traditional vertex update method, this method reduces the number of crossed edges by 12.4%, and the load balance is reduced by 11.7. %. Compared with the dynamic graph partition method based on community discovery, the number of crossed edges is reduced by 9.3%, and the load balance degree is reduced by 2.2%. It can be obtained that the optimized method only achieves the partition of dynamic graphs through partial vertex position updates. Effectively avoid re-dividing the entire dynamic graph, and improve the quality of graph partition.

参考文献:

[1]Chen X, Pan L. A survey of graph cuts/graph search based medical image segmentation[J]. IEEE Reviews in Biomedical Engineering, 2018, 11(99):112-124.

[2]Barsky A, Munzner T, Gardy J, et al. Cerebral: Visualizing multiple experimental conditions on a graph with biological context[J]. IEEE Transactions on Visualization and Computer Graphics, 2008, 14(6):1253-1260.

[3]Low Y, Gonzalez J, Kyrola A, et al. Distributed GraphLab: A framework for machine learning in the cloud[J]. Proceedings of the Vldb Endowment, 2012, 5(8):716-727.

[4]Capponi A, Fiandrino C, Kliazovich D, et al. A cost-effective distributed framework for Data collection in cloud-Based mobile crowd sensing architectures[J]. IEEE Transactions on Sustainable Computing, 2017, 2(1):3-16.

[5]许金凤,董一鸿,王诗懿,等.大规模图数据划分算法综述[J].电信科学,2014,3(07):106-112.

[6]许金凤.大规模动态自适应图划分算法[D].宁波:宁波大学,2015.

[7]Malewicz G, Austern M H, Bik A J C, et al. Pregel: A system for large-scale graph processing[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. New York, NY, USA: ACM, 2010:135-146.

[8]Kernighan B W, Lin S. An efficient heuristic procedure for partitioning graphs[J]. Bell Labs Technical Journal, 1970, 49(2):291-307.

[9]Fiduccia C M, Mattheyses R M. A linear-time heuristic for improving network partitions[C]//19th Design Automation Conference. LasVegas,NV,USA:IEEE, 1982:175-181.

[10]Fiedler M. Algebraic connectivity of graphs[J]. Czechoslovak Mathematical Journal, 1973, 23(98):298-305.

[11]Barnard S T , Simon H D . Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems[J]. Concurrency & Computation Practice & Experience, 2010, 6(2):101-117.

[12]Hendrickson B, Leland R. An improved spectral graph partitioning algorithm for mapping parallel computations[J]. SIAM Journal of Scientific Computing, 1995, 16(2):452-469.

[13]Garey M R, Johnson D S, Stockmeyer L J. Some simplified np-complete graph problems[J]. Theoretical Computer Science, 1976, 1(3):237-267.

[14]Hendrickson B,Leland R. A multilevel algorithm for partitioning graphs[C]//Proceedings of the 1995 ACM/IEEE Conference on Supercomputing. New York, NY, USA: ACM, 1995:28-36.

[15]Bui T N, Jones C. A heuristic for reducing fill in sparse matrix factorization[C]//Proceeding of the 6th SIAM Conference on Parallel Processing for Scientific Computing. Philadelphia. PA, USA: Society for Industrial & Applied Mathematics, 1993:445-452.

[16]Hauck S, Borriello G. An evaluation of bipartitioning technique[C]//Proceedings of the 16th Conference on Advanced Research in VLSI. Washington, DC,USA: IEEE, 1995:383-402.

[17]Cong J, Smith M. A parallel bottom-up clustering algorithm with applications to circuit partitioning in VLSI design[C]//Proceedings of the 30th International Design Automation Conference. New York, NY, USA:ACM, 1993:755-760.

[18]Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs[J]. Siam Journal on Scientific Computing, 1998, 20(1):359-392.

[19]Rahimian, Fatemeh, et al. "JA-BE-JA: A distributed algorithm for balanced graph partitioning."[C]//Proceedings of the 2013 IEEE 7th International Conference on Self-Adaptive and Self-Organizing Systems. New York, NY, USA: ACM, 2013:51-60.

[20]Stanton I, Kliot G. Streaming graph partitioning for large distributed graphs[C]//Acm Sigkdd International Conference on Knowledge Discovery & Data Mining. New York, NY, USA: ACM, 2013:1222-1230.

[21]Tsourakakis C E, Gkantsidis C, Radunovic B, et al. Fennel: Streaming graph partitioning for massive scale graphs[C]//Acm International Conference on Web Search & Data Mining. New York, NY, USA: ACM, 2014:333-342.

[22]Nishimura J, Ugander J. Restreaming graph partitioning: simple versatile algorithms for advanced balancing[C]//Acm Sigkdd International Conference on Knowledge Discovery & Data Mining. New York. NY, USA: ACM, 2013:1106-1114.

[23]Vaquero L, Cuadrado, Félix, Logothetis D, et al. Adaptive partitioning for large-scale dynamic graphs[C]//2014 IEEE 34th International Conference on Distributed Computing Systems. LasVegas,NV,USA: IEEE, 2014:144-153.

[24]Abdolrashidi A, Ramaswamy L. Continual and cost-effective partitioning of dynamic graphs for optimizing big graph processing systems[C]//2016 IEEE International Congress on Big Data. LasVegas,NV,USA: IEEE, 2016:18-25.

[25]Ju W, Li J, Yu W, et al. iGraph: an incremental data processing system for dynamic graph[J]. Frontiers of Computer Science, 2016, 10(3):462-476.

[26]Zhang W, Chen Y, Dai D. AKIN: A streaming graph partitioning algorithm for distributed graph storage systems[C]//IEEE/ACM International Symposium on Cluster. New York, NY, USA: ACM, 2018:183-192.

[27]Mayer C, Mayer R, Tariq M A, et al. ADWISE: Adaptive window-based streaming edge partitioning for high-speed graph processing[C]//2018 IEEE 38th International Conference on Distributed Computing Systems. LasVegas,NV,USA: IEEE, 2018:685-695.

[28]骆融臻.分布式流式图划分系统的设计与实现[D].南京:南京大学,2019.

[29]卢志刚,解婉婷.基于片段的企业信任网络演化图聚类算法[J].计算机应用,2018,38(01):270-276.

[30]尚宁.一种高效的大规模动态图划分算法的研究与实现[D].西安:西安电子科技大学,2019.

[31]刘喜勋,何苗.一边多权图的邻接矩阵实现技术和Dijkstra算法的改进[J].自动化与仪器仪表,2016,12(07):186-189.

[32]董加强.基于邻接表的图生成算法探讨[J].西昌学院学报(自然科学版),2009,23(02):43-45.

[33]张小艳.数据结构与算法设计[M].西安,西安科技大学出版社,2015.

[34]潘金贵.现代计算机常用数据结构和算法[M].南京,南京大学出版社,1994.

[35]黄明,吴炜.快刀初试:Spark GraphX在淘宝的实践[J].程序员,2014,0(8):98-103.

[36]雷霞.关于图的点划分问题的研究[D].福州:福州大学,2015.

[37]Nohuddin P N E, Sunayama W, Christley R, et al. Trend mining in social networks: from trend identification to visualization[J]. Expert Systems, 2014, 31(5):457-468.

[38]周爽,鲍玉斌,王志刚,等.BHP:面向BSP模型的负载均衡Hash图数据划分[J].计算机科学与探索,2014,8(01):40-50.

[39]王晶.若干图类交叉数的研究[D].长沙:湖南师范大学,2009.

[40]高峰.基于BSP模型的大图处理系统数据划分模块的设计与实现[D].沈阳:东北大学,2012.

[41]Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs [J]. Siam Journal on Scientific Computing. 2006, 20(1):359–392.

[42]Karypis G, Kumar V. Analysis of multilevel graph partitioning[C]//Proceedings of the IEEE/ACM SC95 Conference. New York, NY, USA: ACM, 1995:29–30.

[43]Karypis G, Kumar V. Metis: a software package for partitioning unstructured graphs [J]. International Cryogenics Monograph. 1998,33(02):121–124.

[44]Karypis G, Kumar V. METIS – unstructured graph partitioning and sparse matrix ordering system, version 2.0[D]. Minneapolis MN, University of Minnesota, 1995.

[45]颜云志,王汉兴.一类幂律图模型[J].上海交通大学学报.2009,43(08):1347-1349.

[46]Vaquero L, Cuadrado F, Logothetis D, et al. xDGP: A dynamic graph processing system with adaptive partitioning[EB/OL]. Arxiv:1309.1049. 2013.

[47]王莉,程学旗.在线社会网络的动态社区发现及演化[J].计算机学报,2015,38(2):219-237.

[48]桂春.改进的谱二分法在重叠社团发现中的应用[J].武汉大学学报(工学版),2019,52(04):372-376.

[49]马锐,高浩然,窦伯文,等.基于改进GN算法的程序控制流图划分方法[J].清华大学学报(自然科学版),2019,59(01):15-22.

[50]蒋鹤,范小晶,封学军,蒋柳鹏.基于Newman快速算法的航运网络社团结构[J].长沙理工大学学报(自然科学版),2018,15(03):35-39.

[51]庞峰.模拟退火算法的原理及算法在优化问题上的应用[D].吉林:吉林大学,2006.

[52]Christen P. A survey of indexing techniques for scalable record linkage and deduplication [J]. IEEE Trans on Knowledge and Data Engineering. 2012,24(9):1537-1555.

中图分类号:

 TP301.6    

开放日期:

 2021-06-21    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式