Benchmark Results – ArangoDB vs. Neo4j : ArangoDB up to 8x faster than Neo4j

Introduction

This document presents the benchmark results comparing the ArangoDB’s Graph Analytics Engine (GAE) against Neo4j. The GAE is just one component of ArangoDB’s Data Science Suite. 

This reproducible benchmark aims to provide a neutral and thorough comparison between the two databases, ensuring a fair and unbiased assessment.

We use the wiki-Talk dataset, a widely used, real-world graph dataset derived from the edit and discussion history of Wikipedia

The wiki-Talk dataset encapsulates communication patterns between Wikipedia users, specifically interactions on user talk pages. This dataset is used frequently in benchmarking graph databases and graph analytics systems because of its unique characteristics. The key characteristics of wiki-Talk that make it a highly reliable benchmarking dataset are: Directed Graph, Nodes and Edges, Scale, Temporal Dimension, Sparsity, etc. 

The results demonstrate the efficiency and scalability of each database, and offer a representative benchmark model for organizations evaluating graph databases for their needs.

Benchmark Highlights

The benchmark results reveal several notable insights, particularly highlighting ArangoDB's superior performance in graph analytics tasks compared to Neo4j. Most strikingly:

  • ArangoDB consistently outperformed Neo4j across various graph computation algorithms, with performance improvements that range from 1.3 times to over 8 times faster.
  • This substantial speed advantage is also evident in graph loading times, where ArangoDB demonstrated an impressive 100% advantage in graph loading efficiency vs Neo4j, for the wiki-Talk dataset.

ArangoDB's optimized data storage and retrieval, combined with its advanced query execution and effective use of clustered deployments, also contributed significantly to its superior performance in these scenarios.

These findings underscore:

  • ArangoDB's capability to handle much larger-scale and far faster real-time graph analytics applications.
  • ArangoDB as a much more compelling choice for industries and organizations that require rapid data processing and analysis, such as real-time recommendation systems, social network analysis, fraud detection, and cyber security.

Benchmark Overview

Datasets (wiki-Talk)

We utilized the wiki-Talk dataset, a well-regarded dataset for evaluating graph database performance. The chosen graphs and their details are as follows:

Graphs UsedVerticesEdges
wiki-Talk2,394,3855,021,410

Hardware

All tests were conducted on the same machine with the following specifications:

          OS              Ubuntu 23.10 (64-bit)
          Memory    192 GB (4800 MHz)
          CPU           Ryzen 9 7950X3D (16 Cores, 32 Threads)

Database Configuration

           ***Neo4j***

          Version             5.19.0 (Community Edition)
          Deployment     On-Premise, Single Process

          ***ArangoDB***

         Version                3.12.0-NIGHTLY.20240305 (Community Edition)
         Deployment         On-Premise, Single Process

Graph Analytics Engine (GAE)

        Version                Latest
        Deployment        On-Premise, Single Process (RUST-based, no   multithreading)

Benchmark Configuration

    Two workflows were used to measure performance:

     Workflow A:

  1.  Create the in-memory representation
  2. Execute each algorithm once
  3. Measure the whole process

     Workflow B

  1. Create the in-memory representation
  2. Measure graph creation time
  3. Execute each algorithm individually
  4. Measure computation time

Algorithms Tested

  • Pagerank
  • Weakly Connected Components (WCC)
  • Strongly Connected Components (SCC)
  • Label Propagation

Used Technologies

  • JavaScript Framework: Vitest with tinybench
  • Communication
    • Neo4j: Official Neo4j JS driver ("neo4j-driver": "^5.18.0")
    • GAE: Plain HTTPs requests using Axios ("axios": "^1.6.8")

Benchmark Results

Graph Loading (wiki-Talk)

TaskGAE (sec)Neo4j (sec)Times Faster
Load graph wiki-Talk9.9181.8 x
Load Graph wiki-Talk with Attributes10.719.21.8 x

graph computation

Graph Computation (wiki-Talk)

TaskGAE (sec)Neo4j (sec)Times Faster
Compute PageRank3.810.62.8 x
Compute WCC2.34.51.7 x
Compute SCC3.26.72.1 x
Compute Label Propagation1.5138.5 x

Explanation of Elements

graph loading

Graph Algorithms

  • Pagerank, An algorithm that is used to rank nodes in a graph based on their connections, also commonly used in search engines. 
  • Weakly Connected Components (WCC), which identifies subsets of a graph where any two vertices are connected by paths, ignoring the direction of edges. 
  • Strongly Connected Components (SCC), Identifying subsets of a graph where every vertex is reachable from every other vertex within the same subset. 
  • Label Propagation, a semi-supervised learning algorithm for community detection in graphs, where nodes propagate their labels to their neighbors iteratively.

Reasons for ArangoDB’s Superior Performance

Several factors contribute to ArangoDB's superior performance:

The performance of ArangoDB on the Wiki-Talk dataset is attributed to specific architectural optimizations rather than on raw computational benchmarks. In this scenario, ArangoDB serves as a data storage system, while the computation is handled by the Graph Analytics Engine (GAE). The benchmark focuses on two key stages:

  1. Loading the data into the GAE
  2. Computation of algorithms within the GAE

Graph Loading Times

ArangoDB Side

ArangoDB’s graph loading times are optimized due to two primary factors:

  1.  Parallel Data ExtractionArangoDB’s support for parallel data loading from both single and distributed systems is a big reason for data loading performance advantages. This capability lets teams scale to multiple machines, where increased parallelism gets you faster data transfer. By enabling efficient horizontal scaling, the system achieves significant performance improvements compared to approaches that are limited to sequential or that don’t leverage parallel extractions.
  2.  Projections for Targeted Data TransferProjections allow ArangoDB to transmit only the data attributes required for analysis. So, if only edge IDs and a single attribute are needed, the system  only extracts and transfers these fields, avoiding the overhead of transmitting entire documents. This reduces both the data volume and network latency during graph loading operations.

Graph Analytics Engine (GAE) Side

The GAE is built using RUST, and it processes the transferred data with high efficiency:

  • Efficient Data Representation
    The GAE stores graph data within highly optimized in-memory structures, reducing memory usage while at the same time maintaining extremely fast access speeds. Graphs are immediately ready for computation without unnecessary delays.

Advantages in the Workflow

These features deliver several tangible benefits, as shown during the benchmark:

  1. Fast and Parallel Data Extraction - Parallelism improves speed and scalability. 
  2. Optimized Data Transfer with Projections - Only the required data is transmitted, minimizing overhead. 
  3. Compact and Efficient In-Memory Representation in GA - High-performance graph computation with minimal memory footprint.

Clarifying the Benchmark Scope

It is important to note that the benchmark does not evaluate data insertion times into ArangoDB or computational tasks performed by ArangoDB itself. Instead, it assesses the efficiency of:

  • Loading graph data from ArangoDB into the GAE.
  • The GAE's ability to compute graph algorithms.

By highlighting these stages, the benchmark shows the advantages of ArangoDB’s design in supporting large-scale graph workflows through fast data loading and efficient interaction with the GAE.

Reproducibility of the Benchmark

This benchmark is 100% reproducible, ensuring consistent and verifiable results. These results reflect ArangoDB’s implementation per the precise specifications and configurations mentioned above. We welcome organizations to replicate the benchmark to ensure consistent results. To do this, follow these steps:

  1. First, set up the hardware environment with an Ubuntu 23.10 operating system, 192 GB of memory, and a Ryzen 9 7950X3D CPU.
  2. Install and configure the latest versions of Neo4j and ArangoDB using the provided Docker configurations. Use single-threaded (non-clustered) configurations for both.
  3. Next, utilize the wiki-Talk dataset for testing. Execute the specified graph algorithms (PageRank, WCC, SCC, Label Propagation) using the detailed workflows (A and B) outlined in the benchmark configuration above.
  4. Measure the in-memory graph creation and computation times, and compare the results for both databases. This method ensures that the benchmark can be reliably reproduced in different environments.

PLEASE NOTE: This benchmark requires the installation of the ArangoDB Graph Analytics Engine (GAE). As this code is not open source, please reach out to Corey Sommers at corey.sommers@arangodb.com to receive access to the GAE for the purposes of reproducing this benchmark in your environment (to ensure objectivity of results).

Conclusion

The benchmark results clearly demonstrate ArangoDB's far superior performance over Neo4j in the categories of graph computation and loading tasks. ArangoDB's significant speed advantages - particularly its ability to execute complex algorithms and load large datasets much faster - highlight its optimized architecture and efficient data handling.

These findings make ArangoDB a compelling choice for applications requiring high-performance graph analytics and real-time data processing.

More info...

ArangoJS 4 Alpha: Available Now for Testing

The first alpha of the official JavaScript driver arangojs‘ upcoming major release is now available on npm.

Version 4 streamlines the driver’s API by removing unnecessary server roundtrips to obtain references to collections and graphs that already exist:

Before:

var db = require('arangojs')();
db.collection('users')
.then(function(collection) {
 return collection.import(allTheUsers)
})
.then(function() {
 return db.collection('blogs')
})
.then(function(collection) {
 return collection.import(allTheBlogs);
})
.then(function() {
 return db.collection('articles')
})
.then(function(collection) {
 return collection.import(allTheArticles);
})
.then(handleSuccess)
.catch(handleErrors);

After:

var db = require('arangojs')();
db.collection('users').import(allTheUsers)
.then(function() {
 return db.collection('blogs').import(allTheBlogs);
})
.then(function() {
 return db.collection('articles').import(allTheArticles);
})
.then(handleSuccess)
.catch(handleErrors);

(more…)

More info...

ArangoDB JavaScript Driver 3.7: Promises & Performance

ArangoJS, the official ArangoDB JavaScript client, has been updated to version 3.7.0. The new release features significant performance improvements in Node.js and io.js. The dependency on the third-party request module has been replaced with a thin wrapper around node’s own http module, bringing a 3-4x performance improvement for consecutive requests by maintaining a connection pool.

The earlier 3.5 release also added optional support for ES6 promises. While ArangoJS does not provide a promise implementation itself, all asynchronous methods now return promises in JavaScript environments that support them – whether natively (e.g. in io.js or modern browsers) or using a polyfill like es6-promise.

The latest version of ArangoJS is available on NPM and GitHub.

More info...

Crawling GitHub with Promises: ArangoDB Tutorial

The new Javascript driver no longer imposes any promises implementation. It follows the standard callback pattern with a callback using err and res.

I wanted to give the new driver a try. A github crawler seemed like a good side-project, especially because the node-github driver follows the same conventions as the Javascript driver.

There are a lot of promise libraries out there. The most popular one – according to NPM – was promises. It should be possible to use any implementation. Therefore I used this one.

(more…)

More info...

ArangoDB Java Driver for Graphs: Enhanced Functionality

After defining a graph and filling it with some vertices and edges (see part 1), the time has come to retrieve information out of the graph.

Please take a look at the defined graph operations of ArangoDB. These will be the base for our next examples. (Yes, there may be other ways to get the results, this post does not claim completeness!)

We will start with some easy stuff and then smoothly advance in complexity.

Question: “How many edges are defined within the graph?”

(more…)

More info...

ArangoDB Java Driver: Graph Data Manipulation & Queries

With ArangoDB 2.2 the new graph API was released featuring multi collection graphs (see blog). With the new version (2.2.1) of arangodb-java-driver the new graph API is supported. In the following you can find a small example of creating a graph with Java.

For the import via maven and configuring the driver, please read the Basics and Driver Setup. For the following we assume, that arangodbDriver is a configured instance of the driver.

So let’s start the whole thing…

(more…)

More info...

ArangoDB Java Driver: Batch & Asynchronous Mode | ArangoDB Blog

The current arangodb-java-driver supports the usage of ArangoDB’s batch and asynchronous interface. This post will guide you through the usage of these features.

The batch interface

The batch interface enables the user to stack a series of calls and execute them in a batch request. Each stacked request returns a request id that can be used to retrieve the single results from the batch response. So how do you use this feature in the java driver ?

First we create an instance of the java driver:

ArangoConfigure configure = new ArangoConfigure();
configure.init();
ArangoDriver driver = new ArangoDriver(configure);

(more…)

More info...

ArangoDB Java Driver: Simplifying Database Interactions

A new arangodb-java-driver is out now, it’s on github. The driver is available for ArangoDB from version 2.2 onwards.

How to include the driver in your application ?

The driver is available as maven artifact. To add the driver to your project with maven, add the following code to your pom.xml:

<dependencies>
  <dependency>
    <groupId>com.arangodb</groupId>
    <artifactId>arangodb-java-driver</artifactId>
    <version>2.2</version>
  </dependency>
  ....
</dependencies>

(more…)

More info...

AshikawaCore 0.10 Released: Enhancements & Updates | ArangoDB

We just released version 0.10 of the low-level ArangoDB Ruby driver Ashikawa::Core. It supports both ArangoDB 1.4 and 2.0. For more details see the release notes.

We’re also working on the first version of Guacamole: It is an ODM for Rails and is based upon Ashikawa::Core.

More info...

ArangoDB PHP Driver Version 1.0 Released | ArangoDB 2012

Yesterday version 1.0 of the PHP driver for ArangoDB was released. It contains basic support for edges as well as fixes and tests. Check out the Changelog for further details. Many thanks go to Frank Mayer for his contribution! 🙂

There is also a comprehensive PHP driver tutorial available on Github.

More info...

Get the latest tutorials,
blog posts and news: