ArangoDB 3.2 Beta: RocksDB Storage Engine & Distributed Graph Cluster

We’re excited to release today the beta of ArangoDB 3.2. It’s feature rich, well tested and hopefully plenty of fun for all of you. Keen to take it for a spin? Get ArangoDB 3.2 beta here.

With ArangoDB 3.2, we’re introducing the long-awaited pluggable storage engine and its first new citizen, RocksDB from Facebook

  • RocksDB: You can now use as much data in ArangoDB as you can fit on your disk. Plus, you can enjoy performance boosts on writes by having only document-level locks (more info below).
  • Pregel: Furthermore, we implemented distributed graph processing with Pregel for discovering hidden patterns, identify communities and perform in-depth analytics of large graph data sets.
  • ClusterFoxx: Another important upgrade is what we internally and playfully call the ClusterFoxx. The Foxx management internals have been rewritten from the ground up to make sure multi-coordinator cluster setups always keep their services in sync and new coordinators are fully initialised even when all existing coordinators are unavailable.
  • Enterprise: Working with some of our largest customers, we’ve added further security and scalability features to ArangoDB Enterprise like LDAP integration, Encryption at Rest, and the brand new Satellite Collections.

The goal of the whole ArangoDB 3 release cycle has been to scale the multi-model idea to new heights. Getting ‘ready’ for large scale applications is not done overnight and it’s definitely not possible without the help of a strong community. We’d like to invite all of you to lend us a helping hand to make ArangoDB 3.2 the best release ever. Please push this beta to its limits: test it for your use cases and compare the performance of the new features like RocksDB. Let us know on Github any bug that you find. Don’t worry about hurting our feelings: we want to fix any problems.

Join the Beta Bug Hunt Challenge and win a $200 Amazon Gift Card as first prize. You can find more details about this reward program at the end of this post.

New Storage Engine RocksDB

ArangoDB now comes with two storage engines: mmfiles and RocksDB. If you want to compare the engines, you can use arangodump to export data from either engine and arangorestore to import into the other. MMFILES are generally well suited for use-cases that fit into main memory, while RocksDB allows larger than memory work-sets.

RocksDB has plenty of configuration options; we have selected the general purpose options. Please let us know how it works for your use case so that we can further optimize the implementation. Also notice that we do many tests under Linux, Windows and macOS. However, we optimize for Linux. Any feedback regarding other operating systems is very welcome. Check out the step by step guide to compare both storage engines for your use case and OS!

Benefits of RocksDB Storage Engine:

  • Document-level locks: performance boost for write intensive applications. Writes don’t block reads, and reads don’t block writes
  • Support for large datasets: go beyond the limit of main memory and stay performant
  • Persistent indexes: faster index build after restart

Things to consider before switching to RocksDB

  • RocksDB allows concurrent writes: Write conflicts can be raised. Applications switching from MMFiles must be prepared for exceptions
  • Transaction Limit in RocksDB: The total size of transactions is limited in RocksDB. Modifying statements on large amounts of documents have to commit in between -- with AQL this is done by default.
  • Engine Selection on Server/Cluster Level: It’s not possible to mix both storage engines within a single instance or cluster installation. Transaction handling and write ahead log formats are different.

Find all important details about RocksDB in the storage engine documentation, as well as answers to common questions about our RocksDB integration in the FAQ.

Please note that ArangoDB 3.2 beta is fully tested, but not yet fully optimized (known-issues RocksDB). If you find something that is much slower with RocksDB compared to your current queries with the MMFiles engine, please create a github ticket. Please check the comparison guide here.

New Distributed Graph Processing

With the new implementation of distributed graph processing, you are now able to analyze even very large graph data sets as a whole. Internally, we implemented the pregel computing model to enable ArangoDB to support arbitrary graph algorithms, which will scale with your data -- or with the size of your database cluster.

Pregel Messages

You can already use a number of well-known graph algorithms:

  • PageRank
  • Weakly Connected Components
  • Strongly Connected Components
  • HITS (hubs and authorities)
  • Single-Source Shortest Path
  • Community Detection via Label Propagation
  • Vertex Centrality measures
    • Closeness Centrality via Effective Closeness
    • Betweenness Centrality via LineRank

By using these new capabilities, you are now able, for example, to detect communities within your graph, shard your data according to these communities and leverage ArangoDB SmartGraphs to perform highly efficient graph traversals even in a cluster setup.

New ClusterFoxx

Managing your JavaScript microservices is now easier and more reliable than ever before. The Foxx management internals have been rewritten to make sure multi-coordinator cluster setups always keep their services in sync and new coordinators are fully initialised even when all existing coordinators are unavailable.

Additionally, the new fully documented REST API for managing Foxx services enables you to install, upgrade and configure your services using your existing devops processes. And if your service only consists of a single JavaScript file, you can now forego the manifest and upload that file directly, instead of creating a full bundle.

Further useful new features included in ArangoDB 3.2 beta

  • geo_cursor: Get documents sorted by distance to a certain point in space. You can also apply filters and limits to geo_cursor.
  • arangoexport: Export data as JSON, JSONL and even graphs as XGMML for visualisation in Cytoscape. You can find details in the Alpha2 release post.

Download ArangoDB 3.2 beta Community

New Enterprise Edition Features in 3.2

The Enterprise Edition of ArangoDB is focused to solve enterprise-scale problems and meet high security standards. In version 3.1, we introduced SmartGraphs to bring fast traversal times to sharded graph datasets. With SatelliteCollections, we enable the same performance boost to join operations at scale.

SatelliteCollections

From genome-sequencing projects to massive online games and beyond, we see the need for join operations including sharded collections and sub-second response times.

With SatelliteCollections, you can define collections to shard to a cluster and collections to replicate to each machine. The ArangoDB query optimizer knows where each shard is located and sends the requests to the DBServers involved, which then executes the query, locally. With this approach, network hops during join operations on sharded collections can be avoided and response times can be close to that of a single instance.

In the example below, collection C is large and sharded to multiple machines while the smaller satellites (S1-S5) are replicated to each machine.

We are super excited to see what you will create with this new feature and welcome any feedback you can provide. The Enterprise Edition of ArangoDB is forever free for evaluation. So feel free to take it for a spin.

Encryption at Rest

With RocksDB, you can encrypt the data stored on disk using a highly secure AES algorithm. With this upgrade, ArangoDB takes another big step towards HIPPA compliance. Even if someone steals one of your discs, they won’t be able to access the data.

Enhanced Authorisation with LDAP

Normally, users are defined and managed in ArangoDB itself. With LDAP, you can use an external server to manage your users. We have implemented a common schema which can be extended. If you have special requirements that do not fit into this schema, please let us know (#feedback32 channel). A general note: The final release will also support read-only users. With this beta, only read/write users are supported.

Download ArangoDB 3.2 beta Enterprise

Bug Hunt Competition

We’d love to invite all of you to try ArangoDB 3.2 beta and report all bugs you can find -- we’re hoping there won’t be any, but there always are some. Everyone reporting bugs for 3.2 beta on Github will take part in the Beta Bug Hunt Competition. All Github issues count for version 3.2 beta which are marked with bug by the ArangoDB team. The hunter with the most reported bugs wins. The first three runner-ups will receive an honorable mention in the Bug Hunt Challenge post.

How the Bug Hunt Competition works:

  • Duration: from June 13th until June 27th
  • Bugs: Any Github issue for 3.2 beta release which is marked as bug by ArangoDB team
  • Who can participate: everyone
  • First Prize: a $200 Amazon Gift Card + ArangoDB Swag Package
  • Second Prize: $100 Amazon Gift Card + ArangoDB Swag Package
  • Runner-Up: ArangoDB Swag Package

All winners will receive an honorable mention in the bug hunt post and tweets after the challenge. Please note that our team has to be able to reproduce the bug.

Therefore good bug reports

  1. Have only necessary infos included
  2. Provide a step by step description to reproduce the bug
  3. Provide demo data via e.g. gist (if necessary)

We hope you will enjoy this new release - “Gute Jagd!”

Legal

In connection with your participation in this program you agree to comply with all applicable local and national laws. You must not disrupt or compromise any data that is not your own.

ArangoDB reserves the right to discontinue this program or change or modify its terms at any time. The ultimate decision over an award -- whether to give one and in what amount -- is entirely at ArangoDB’s discretion.
You are exclusively responsible for paying any taxes on any reward you receive.

Vulnerabilities obtained by exploiting ArangoDB users or instances on the Internet are not eligible for an award and will result in immediate disqualification from the program.

More info...

RocksDB Integration in ArangoDB: FAQs

The new release of ArangoDB 3.2 is just around the corner and will include some major improvements like distributed graph processing with Pregel or a powerful export tool. But most importantly we integrated Facebook’s RocksDB as the first pluggable storage engine in ArangoDB. With RocksDB you will be able to use as much data in ArangoDB as fits on your disc.

As this is an important change and many questions reached us from the community we wanted to share some answers on the most common questions. Please find them below

Will I be able to go beyond the limit of RAM?

Yes. By defining RocksDB as your storage engine you will be able to work with as much data as fits on your disc.

What is the locking behaviour with RocksDB in ArangoDB?

Read more

More info...

Alpha3 of ArangoDB 3.2: Support for Distributed Graph Processing

The next alpha release of the upcoming ArangoDB 3.2 is available for testing. You can download and install alpha3 here.

Moving forward

As ArangoDB 3.2 will include several new features and improvements, we realized that the release model that we currently follow has room for improvement. Going forward we will introduce milestone releases with ArangoDB 3.3. For this major release you will see a bit more alphas. You can read detailed info about the new structure model here.

Pregel computing model

In this alpha we introduce support for incremental graph processing algorithms in a single mode server as well as in the cluster. Read more

More info...

Introducing milestone release model and why is it better

When developing ArangoDB, we want to share with you new features as early as possible. For example, the pregel implementation that will become part of ArangoDB 3.2 was ready for testing weeks before the release date of the first beta release and the final release of 3.2.

Therefore we decided to create intermediate releases, called alpha releases, which contain new feature as early as they became stable. The benefits for the community and also for our developers team are: Read more

More info...

ArangoDB 3.2 Alpha 2: Preview of Upcoming Release

The official ArangoDB 3.2 release is almost around the corner. In the meantime, you can play around and test some of the upcoming new features as they come along. The alpha2 version of the upcoming ArangoDB 3.2 is available for testing and can be downloaded here. If you already have ArangoDB installed, please remember to backup your data and run an upgrade after installing the alpha2 release. Note that this version is not suitable for production usage and is supplied only for testing purposes.

Not getting into too much detail yet – one major change in ArangoDB 3.2 is that it will contain two storage engines. The current storage engine based on memory mapped files and a new one backed by RocksDB. This alpha2 release contains some steps towards this goal, as well as independent improvements and previews of new features. Read more

More info...

ArangoDB 3.1: Scaling Solutions, Part II

It’s not that long ago since we released ArangoDB 3.0 in which we introduced our binary storage format VelocyPack, the ArangoDB Agency for a self-managing cluster and the first persistent index by implementing Facebooks RocksDB. With all that we laid the foundation for a solid ground to scale with all three data-models.

With today’s ArangoDB 3.1 release we take things a few steps further and make cluster usage of ArangoDB more performant and convenient. Get ArangoDB 3.1.

General upgrades in 3.1

  • Performance boost with our new boost-ASIO server infrastructure
  • Performance boost by overhauling the ArangoDB query optimizer
  • Improved internal abstraction for storage-engines as a preparation for MVCC and pluggable storage-engines
  • VelocyPack over HTTP: Use our binary storage format VelocyPack over HTTP
  • VelocyStream: for high performance needs you can now directly stream VelocyPack. This is already implemented in our Java driver (all other drivers maintained by ArangoDB will follow soon).

Cluster

  • Parallel Intra-Cluster-Communication
  • HLC: The Hybrid Logical Clock is used for timestamps in revision strings which is part of the preparation for cluster-wide transactions
  • Auto-failover timeouts: you can now configure the timeouts for automatic failovers
  • Progress Display when relocating shards
  • Stand-Alone Agency: You can now use ArangoDB as a resilient, RAFT-based key/value store as an alternative to e.g. ZooKeeper or etcd. (You’ll surely ask yourself why we created it and we’ll answer this legitimate question in a blog post soon).

Graph Features

  • Vertex-centric indices for graphs in AQL: You can now generate indices on edges which are a combination of the vertex and an attribute.
  • SmartGraphs: This is a the big new feature of our Enterprise Edition and enables you to shard really huge graphs to a cluster and achieve close to the same performance as on a single instance. Read more about SmartGraphs and our Enterprise Edition.

For the Java World

We put a lot of effort into our new Java Driver which will only work with ArangoDB 3.1 onwards. Our Java team completely refactored the driver which is now up to 4x faster than the previous one. The new features include:

  • multi document operations
  • Uses VelocyPack
  • VelocyStream ready
  • asynchronous request handling

Read more about the features in the corresponding blog post. You can download the new Java drivers here: ArangoDB-Java-Driver 4.1.0 & ArangoDB-Java-Driver-Async 4.1.0. We also included a new detailed Java drivers documentation.

Web UI

  • New Graph Viewer: The previous solution was not suitable for large graph visualizations. With an extended Canvas support the Graph Viewer is now feature-rich and can handle large graph visualizations. As a second engine we made a first implementation of WebGL. Feedback to our new GraphViewer is highly appreciated.
  • AQL Editor: We invested a lot into usability and e.g. simplified the elaboration of performance issues of your queries. With the Query Performance Profiler you can now get info about the query performance so you can investigate which part of the execution took how long. You can also choose between JSON, tabular and graph output for your results.

We hope that we included many useful features for you into ArangoDB 3.1. We appreciate your feedback about the new release a lot. If you’re missing something, find a bug or want to talk about an idea with us, feel free to get in touch via our Slack Community channel or contact us form.

Have fun playing around with ArangoDB 3.1!

More info...

ArangoDB 3.1 Enterprise: Scaling Graphs

In addition to our community version of ArangoDB 3.1 we are excited to release our first Enterprise Edition today. The Enterprise Editions of ArangoDB focuses on enterprise-scale problems and provides useful features to meet the requirements of enterprise customers. You can download a free evaluation-only version here: Download Enterprise Edition. ArangoDB Enterprise Edition also comes with the Enterprise subscription, including comprehensive support SLA.

This first ArangoDB Enterprise Edition includes three major features:

  • SmartGraphs: Scale with graphs to a cluster and stay performant. With SmartGraphs you can use the “smartness” of your application layer to shard your graph efficiently to your machines and let traversals run locally
  • Encryption Control: Choose your level of SSL encryption
  • Auditing: Keep a detailed log of all the important things that happened in ArangoDB

Read more

More info...

ArangoDB 3.1 Release Candidate 2: What’s New

We are glad to announce that the second release candidate (RC2) of ArangoDB 3.1 is publicly available. What makes this release particularly special to us is that it also includes an official release candidate of our new Enterprise Edition with a few extra add-ons up its sleeve. The upcoming ArangoDB 3.1 will be a significant release taking effort from the solid base built with ArangoDB 3.0 which introduced our binary storage format VelocyPack, the ArangoDB Agency for a self-managing cluster architecture and the first persistent index based on Facebook’s RocksDB.

RC2 of ArangoDB 3.1 is available for download and evaluation: Community Edition & Enterprise Edition. The documentation for the ArangoDB 3.1 RC 2 can be found here.
Read more

More info...

ArangoDB 3.0: A Solid Foundation for Scalability

After 6 months of development we are happy and excited to announce the fully production ready ArangoDB 3.0 today! Get ArangoDB 3.0 now!

We designed ArangoDB as a native multi-model DB from the first line of code. By providing three major NoSQL data models in one technology the ArangoDB team wants to fulfill its mission to simplify data work. With ArangoDB 3 we believe that our users will come more than one step closer to a dramatically simpler way to create their applications. Read more

More info...

Discover ArangoDB 3.0: New Cluster Features

The 3.0 release of ArangoDB will introduce a completely overhauled cluster and marks a major milestone on its road to “zero-maintenance” where you can keep focus on your product instead of your datacenter.

Synchronous replication

Earlier releases of ArangoDB already featured asynchronous replication. This was already a great method to do backups and allowed for failover in case of a disaster. However that was mostly a manual job and furthermore – due to its asynchronous nature – data loss could happen. Read more

More info...

Get the latest tutorials,
blog posts and news: