Exploring the open source of graph computing frameworks

Amy Kelly
3 min readJul 5, 2021

In recent years, with the continuous improvement of server computing speed and the continuous decline of data storage costs, knowledge graphs, sensor networks, and social networks have achieved rapid development, making graph data an important source in big data. The top 100 companies in the world are gradually increasing the proportion of using graph databases, graph-driven algorithms and analysis will become the key technologies in the 2020s.

As graph computing continues to be popular today, understanding and mastering the graph computing framework will help data scientists discover the interdependence and correlation in big data. Now, I will introduce several well-known graph computing frameworks in the industry!

Frameworks 1: NetworkX

NetworkX was created in May 2002. It is a graph theory and complex network modeling tool developed in Python language. The package provides classes for graph objects, generators to create standard graphs, IO routines for reading in existing datasets, algorithms to analyze the resulting networks, and some basic drawing tools. By using NetworkX, you can store networks in standardized and non-standardized data formats, analyze network structures, establish network models, design new network algorithms, and perform network drawings.

Frameworks 2: GraphX

GraphX is a Spark API, which is used for graph and parallel graph calculations. The property graph is parameterized over the vertex and edge types. These are the types of objects associated with each vertex and edge respectively. GraphX combines graph-parallel computing and data-parallel to the system, which efficiently executes complex graph algorithms.

Although distributed graph systems can calculate PageRank and other algorithms very well, they are not suitable for constructing graphs from different data sources or calculating features across multiple graphs. The calculation view of distributed graph systems can not handle those requirements for constructing and transforming graph structures. Therefore, the graph pipeline must be implemented by combining graph-parallel and data-parallel, this combination will inevitably lead to a large amount of data movement and data replication. It will also make such a system extremely complicated.

Frameworks 3: CuGraph

CuGraph is a collection of graphics algorithms implemented by Nvidia on Nvidia GPUs. Traditional machine learning wastes a lot of resources. To improve the efficiency of GPU utilization, RAPIDS specifically designed the CuGraph module to process graph data.

CuGraph is a collection of Python-based graphics algorithms implemented on Nvidia GPUs, which can speed up graphics processing and is mainly used to perform high-performance analysis on graphics. Data scientists and software developers can use these libraries to integrate GPU functionality into their work without understanding all the implementation and hardware details. CuGraph is a graph data analysis framework and cannot be used to replace graph databases. The main emphasis of CuGraph has shifted to common functions of databases such as adding, deleting, modifying, and querying when they analyze graph data.

Frameworks 4: GraphScope

GraphScope is a one-stop graph computing platform developed and open source by Alibaba Dharma Academy’s Intelligent Computing Laboratory. It relies on Alibaba’s massive data and richer scene information.

GraphScope can simultaneously support automatic parallelization of algorithms and automatic incremental processing of dynamic graph updates, data partitioning, and metadata management. It can provide zero-copy data reading, one-stop and convenient development for upper applications. It supports Gremlin distributed compilation and optimization for the first time in the industry. It also provides various common analysis algorithms for connectivity computing, numerical computing, and achieving the ultimate performance of enterprise-level scenarios.

--

--