# GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

@article{Xin2014GraphXUD, title={GraphX: Unifying Data-Parallel and Graph-Parallel Analytics}, author={Reynold Xin and Daniel Crankshaw and Ankur Dave and Joseph E. Gonzalez and Michael J. Franklin and Ion Stoica}, journal={ArXiv}, year={2014}, volume={abs/1402.2394} }

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance… Expand

#### Supplemental Presentations

Presentation Slides

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

#### Tables and Topics from this paper

#### 79 Citations

Systems for Big-Graphs

- Computer Science
- Proc. VLDB Endow.
- 2014

This tutorial discusses the design of the emerging systems for processing of big-graphs, key features of distributed graph algorithms, as well as graph partitioning and workload balancing techniques, and highlights the current challenges and some future research directions. Expand

The Taxonomy of Distributed Graph Analytics

- Computer Science
- 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS)
- 2018

This paper aims to provide the taxonomy of various distributed programming models, distributed graph processing frameworks and various kinds of graph analytics that are essential for the analysis of large-scale networks. Expand

A communication-reduced and computation-balanced framework for fast graph computation

- Computer Science
- Frontiers of Computer Science
- 2018

Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a high-bandwidth network. Expand

GraphU: A Unified Vertex-Centric Parallel Graph Processing Platform

- Computer Science
- 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)
- 2018

This work proposes a framework of complexity analysis for DFA-G automaton and shows that it can significantly facilitate complexity analysis on asynchronous programs, and develops a new prototype platform, GraphU, which entirely removes synchronization barriers and decouples remote communication from vertex computation. Expand

On Improving Distributed Pregel-like Graph Processing Systems

- Computer Science
- 2015

The considerable interest in distributed systems that can execute algorithms to process large graphs has led to the creation of many graph processing systems. However, existing systems suffer from… Expand

A Comparative Evaluation of Big Data Frameworks for Graph Processing

- Computer Science
- 2018 4th International Conference on Big Data Innovations and Applications (Innovate-Data)
- 2018

This paper focuses on the scalability of GraphX and Gelly with respect to increasing data volumes and their ability to distribute work between multiple processing nodes in a cluster and shows that choosing between different computing models offered by the frameworks can significantly influence the performance of big data graph computations. Expand

Distributed graph cube generation using Spark framework

- Computer Science
- The Journal of Supercomputing
- 2019

The GraphNaïve and GraphTDC algorithms are proposed, which sequentially computes graph cuboids for all dimensions in a graph, while the Generate Multi-Dimension Table method is proposed to efficiently create a multidimensional graph table to express the graph. Expand

LCC-Graph: A high-performance graph-processing framework with low communication costs

- Computer Science
- 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)
- 2016

Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a high-bandwidth network. Expand

Management and Analysis of Big Graph Data: Current Systems and Open Challenges

- Computer Science
- Handbook of Big Data Technologies
- 2017

This chapter surveys current system approaches for management and analysis of “big graph data”, and outlines a recent research framework called Gradoop that is build on the so-called Extended Property Graph Data Model with dedicated support for analyzing not only single graphs but also collections of graphs. Expand

VENUS: Vertex-centric streamlined graph computation on a single PC

- Computer Science
- 2015 IEEE 31st International Conference on Data Engineering
- 2015

VENUS is a disk-based graph computation system which is able to handle billion-scale problems efficiently on a commodity PC and adopts a novel computing architecture that features vertex-centric “streamlined” processing. Expand

#### References

SHOWING 1-10 OF 26 REFERENCES

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs

- Computer Science
- OSDI
- 2012

This paper describes the challenges of computation on natural graphs in the context of existing graph-parallel abstractions and introduces the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Expand

GraphBuilder: scalable graph ETL framework

- Computer Science
- GRADES
- 2013

The motivation for GraphBuilder, its architecture, MapReduce algorithms, and performance evaluation of the framework are described, and several graph partitioning methods are developed and evaluated. Expand

Pregel: a system for large-scale graph processing

- Computer Science
- SIGMOD Conference
- 2010

A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Expand

Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud

- 2012

While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important… Expand

Signal/Collect: Graph Algorithms for the (Semantic) Web

- Computer Science
- International Semantic Web Conference
- 2010

This paper presents the Signal/Collect programming model for synchronous and asynchronous graph algorithms and demonstrates that this abstraction can capture the essence of many algorithms on graphs in a concise and elegant way by giving Signal/ collect adaptations of various relevant algorithms. Expand

X-Stream: edge-centric graph processing using streaming partitions

- Computer Science
- SOSP
- 2013

X-Stream is novel in using an edge-centric rather than a vertex-centric implementation of this model, and streaming completely unordered edge lists rather than performing random access, and competes favorably with existing systems for graph processing. Expand

The Combinatorial BLAS: design, implementation, and applications

- Computer Science
- Int. J. High Perform. Comput. Appl.
- 2011

The parallel Combinatorial BLAS is described, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications, and an extensible library interface and some guiding principles for future development are provided. Expand

Spinning Fast Iterative Data Flows

- Computer Science
- Proc. VLDB Endow.
- 2012

This work proposes a method to integrate incremental iterations, a form of workset iterations, with parallel dataflows and presents an extension to the programming model for incremental iterations that alleviates for the lack of mutable state in dataflow and allows for exploiting the sparse computational dependencies inherent in many iterative algorithms. Expand

Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks

- Computer Science, Physics
- WWW
- 2011

Experiments performed show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. Expand

Naiad: a timely dataflow system

- Computer Science
- SOSP
- 2013

It is shown that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining. Expand