Skip to main content

Research Repository

Advanced Search

Optimisation techniques for finding connected components in large graphs using GraphX

Turifi, M

Optimisation techniques for finding connected components in large graphs using GraphX Thumbnail


Authors

M Turifi



Contributors

Abstract

The problem of finding connected components in undirected graphs has been well studied. It is an essential pre-processing step to many graph computations, and a fundamental task in graph analytics applications, such as social network analysis, web graph mining and image processing. Recently, it has been a major area of interest within the field of large graph processing. However, much of the research has focused on solving the problem using High Performance Computers (HPC). In large distributed systems, the MapReduce framework dominates the processing of big data, and has been used for finding connected components in big graphs although iterative processing is not directly supported in MapReduce. Current big data processing systems have developed into supporting iterative processing and providing additional features other than MapReduce. Moreover, current connected component algorithms in large distributed processing system only use the traditional approach to choosing the component identifier based on the lexical ordering of the node ID value. This study investigates how to enhance the performance of finding connected components algorithm for large graph in distributed processing system. It uses the approach to considering the graph degree property in choosing the component identifier, reviewing how this can affect the efficiency of the algorithm. In the design of our proposed algorithm features provided by current new processing systems such as moving the computation more toward the data partition in Spark are integrated. This study thus review how this has affected the performance. The degree approach to choosing the component identifier is experimentally tested using different algorithms. The study then applies the proposed approach on the fastest existing algorithm, and experimentally compare the performance of the connected component algorithm using both the original and our modified algorithm. The results show that using the degree approach has played a vital role in the evolution of the graph size during the process, leading to a faster convergence and significant performance improvement when case vertex pruning is applied in the algorithms. Furthermore, they demonstrate that in many cases optimising the design of the algorithm with local pre-processing of the data has resulted in performance enhancement.

Citation

Turifi, M. (in press). Optimisation techniques for finding connected components in large graphs using GraphX. (Thesis). The University of Salford

Thesis Type Thesis
Acceptance Date Feb 1, 2018
Deposit Date Sep 19, 2018
Publicly Available Date Oct 19, 2018
Award Date Jan 11, 2018

Files






You might also like



Downloadable Citations