ISU Electrical and Computer Engineering Archives

Mining Maximal Cliques from a Large Graph using MapReduce: Tackling Highly Uneven Subproblem Sizes

Svendsen, Michael and Tirthapura, Srikanta (2012) Mining Maximal Cliques from a Large Graph using MapReduce: Tackling Highly Uneven Subproblem Sizes. Publisher UNSPECIFIED.

Full text available as:

PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

We consider Maximal Clique Enumeration (MCE) on a large graph. A maximal clique is perhaps the most fundamental dense substructure within a graph, and MCE is an important tool to discover densely connected subgraphs in a graph, with numerous applications in data mining on web graphs, social networks, and biological networks. While MCE is well studied in the sequential case, relatively little is known about eective methods for parallel MCE. We present a new parallel algorithm for MCE, Parallel Enumeration of Cliques using Ordering, PECO. Unlike previous works, which require a post-processing step to remove duplicate and non-maximal cliques, PECO enumerates only maximal cliques with no duplicates. The key technical ingredient is a total ordering of the vertices of the graph which is used to achieve a load balanced distribution of work, and to eliminate redundant work among processors. An important feature of PECO is that the total work of parallel enumeration is equal to the work of a variant of a popular sequential algorithm. We have implemented PECO on the MapReduce framework, and our experiments on a large cloud computing testbed show that the algorithm is able to enumerate maximal cliques in a variety of large real-world graphs of millions of vertices and tens of millions of maximal cliques, and scales well with an increasing number of reducers. A comparison of ordering strategies shows that an ordering based on vertex degree performs the best, improving load balance and reducing total work when compared to the other strategies.

EPrint Type:Technical Report
Uncontrolled Keywords:Maximal Cliques, MapReduce
Subjects:Computer Engineering > SOFTWARE SYSTEMS > Parallel and Distributed Computing
ID Code:623
Identification Number:Identification Number UNSPECIFIED
Deposited By:Mr. Michael Svendsen
Deposited On:09 December 2012

Archive Staff Only: edit this record