PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs

Graphs are Ubiquitous

Natural Graphs (having Power-Law Distribution)

powergraph_star_motif.PNG powergraph_power_law_distribution

Classic Distributed Graph Approach:

Pregel Communication Overhead for Highly-Connected Vertex

GraphLab Communication Overhead for Highly-Connected Vertex

Graph Partitioning: Cut Across Edges

PowerGraph

New: Cut Across Vertices

powergraph_cut_vertex

Workflow Observation

Distributed Execution

powergraph_distributed_execution

  1. Gather: Machine 1,2,3 and 4 locally gather neighbor data for a local partial sum
  2. Gather: Master Machine 1 receives the partial sums
  3. Gather: Master Machine 1 sums all partial sums
  4. Apply: Master Machine 1 updates local node and sends updates to Machines 2,3,4 so they update their local node
  5. Scatter : Within each machine, new node info is communicated to on-machine neighbors

Communication Overhead

Edge Placements

Random

powergraph_random_edge_placement

Coordinated Greedy

Oblivious Greedy

Fault-Tolerance

Performance

PowerGraph Placement: Random vs Coordinated vs Oblivious

powergraph_placement_compare

PowerGraph vs Other Systems

powergraph_perf_1

powergraph_perf_3

powergraph_perf_2