M Turifi
Optimisation techniques for finding connected components in large graphs using GraphX
Turifi, M
Abstract
The problem of finding connected components in undirected graphs has been well studied. It is an essential pre-processing step to many graph computations, and a fundamental task in graph analytics applications, such as social network analysis, web graph mining and image processing. Recently, it has been a major area of interest within the field of large graph processing. However, much of the research has focused on solving the problem using High Performance Computers (HPC). In large distributed systems, the MapReduce framework dominates the processing of big data, and has been used for finding connected components in big graphs although iterative processing is not directly supported in MapReduce. Current big data processing systems have developed into supporting iterative processing and providing additional features other than MapReduce. Moreover, current connected component algorithms in large distributed processing system only use the traditional approach to choosing the component identifier based on the lexical ordering of the node ID value. This study investigates how to enhance the performance of finding connected components algorithm for large graph in distributed processing system. It uses the approach to considering the graph degree property in choosing the component identifier, reviewing how this can affect the efficiency of the algorithm. In the design of our proposed algorithm features provided by current new processing systems such as moving the computation more toward the data partition in Spark are integrated. This study thus review how this has affected the performance. The degree approach to choosing the component identifier is experimentally tested using different algorithms. The study then applies the proposed approach on the fastest existing algorithm, and experimentally compare the performance of the connected component algorithm using both the original and our modified algorithm. The results show that using the degree approach has played a vital role in the evolution of the graph size during the process, leading to a faster convergence and significant performance improvement when case vertex pruning is applied in the algorithms. Furthermore, they demonstrate that in many cases optimising the design of the algorithm with local pre-processing of the data has resulted in performance enhancement.
Citation
Turifi, M. (in press). Optimisation techniques for finding connected components in large graphs using GraphX. (Thesis). The University of Salford
Thesis Type | Thesis |
---|---|
Acceptance Date | Feb 1, 2018 |
Deposit Date | Sep 19, 2018 |
Publicly Available Date | Oct 19, 2018 |
Award Date | Jan 11, 2018 |
Files
Submitted Thesis Maher Turifi.pdf
(3.9 Mb)
PDF
You might also like
Optimizing the Parameters of Relay Selection Model in D2D Network
(2024)
Conference Proceeding
Multiclass Classification and Defect Detection of Steel tube using modified YOLO
(2023)
Conference Proceeding
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search