Comments on: Improving Databases: Open Source Competitive Benchmark

By: Claudius Weinberger

Claudius Weinberger — Wed, 15 Jul 2015 21:58:00 +0000

The discussion around the node driver Oriento has really escalated in a flamewar.

There are requests, where the performance of the driver should not play any role. For instance, for aggregation it is only one call, for shortest path it is 19 calls. In these case the performance improvements from minutes to milli-seconds clearly tributes to the new server. There are the bulk request read and write which are a mix. Part of the performance is lost in the node driver, but this is true for all databases.

BTW an update of the blog post is available, see https://www.arangodb.com/2015/07/multi-model-benchmark-round-1/

By: fceller

fceller — Sat, 11 Jul 2015 08:11:00 +0000

In reply to s.molinari. @scamo:disqus see my comment above

By: fceller

fceller — Sat, 11 Jul 2015 08:04:00 +0000

s.molinari sorry for the delay, I have been away a few days working on customer projects.

Michael Hunger has pointed out, that Neo4J gives more guarantees when writing, namely that the data has been added to the transaction log and that this log has being synced to disk. Therefore if you kill Neo4J directly after receiving the answer, the new written data will be recovered. For MongoDB and ArangoDB, the default guarantees are much more relaxed. Therefore this was not a fair comparison. So, we added a new category “write-sync”. However, with Neo4J there is no option to relax this, so we removed it from the “write” test.

Please note that “write-sync” will be extremely sensitive to the storage used. It will be much slower for a hard-disk. For hard-disk you might get as low as 40 documents writes per seconds (without parallelism) And even for SSD there will be huge difference depending on the model. There even were models which simply lied about fsync. See http://de.triagens.com/frank-celler/2011/10/benchmarking-ssd-with-mongodb-and-couchdb-part-3/ for details.

By: s.molinari

s.molinari — Sat, 11 Jul 2015 06:00:00 +0000

What does the write-sync test?

Scott

By: Claudius Weinberger

Claudius Weinberger — Thu, 09 Jul 2015 08:41:00 +0000

In reply to scamo.

Nice that you like our approach. We have put a lot of effort into it and there is still a lot of work to do 😉 Therefore it is a bit sad, that some people accuse us of manipulating if we need a few days to react.

I was also very surprised by the initial results. For example, the shortest path results were confusing. I was expecting better results. Especially, if there is no shortest path at all. The algorithms are now fixed in the database. But it seems that shortest path does not play a big role in real projects.

A lot of improvements have happened in the meantime. Almost all of these improvements were inside the database, not in the node driver, which was marked as culprit early on by some.

“The objectivity and the accuracy of a benchmark are what are important and the fact you are allowing the other vendors to help improve the benchmark code and data is great.”

That is exactly what I meant. Normally you only get benchmarks like “on my computer I got” without any chance to test and improve. One can debate if the selected use-case are suitable or not. And surely the result will look different for a different set of tests. Or if you use different kind of servers. Or a different kind of operating-system. The use-cases we have selected are for real data and from a real project. However, we wanted to give everyone the opportunity to verify the tests or adjust the environment to her/his needs.

By: fceller

fceller — Thu, 09 Jul 2015 07:10:00 +0000

In reply to MrFT.

@ftvsko:disqus for OrientDB I’m currently using the following query:

SELECT set(out_Relation.key, out_Relation.out.Relation._key) FROM Profile WHERE _key = :key

Which query do suggest to fetch the complete documents?

By: fceller

fceller — Thu, 09 Jul 2015 07:07:00 +0000

In reply to Dário Marcelino.

I have created an import script and uploaded it to https://github.com/weinberger/nosql-tests/tree/master/orientdb I will also rerun the tests. Please note that currently a schema-full database is created.

By: scamo

scamo — Thu, 09 Jul 2015 04:02:00 +0000

In reply to Claudius Weinberger.

“The benchmark was never supposed to make any database look bad.”

Unfortunately, that is the result of any benchmark. It is a comparison and rarely do all the products being compared come out looking the same. At least one always comes out looking the worst, even if the actual result isn’t all that bad. The others simply look better.

The objectivity and the accuracy of a benchmark are what are important and the fact you are allowing the other vendors to help improve the benchmark code and data is great. I also find it interesting how well MongoDB came out in some of the results, considering it doesn’t even really support graphs directly.

One thing I am also not certain about, what is the “write-sync” test? I don’t recall reading what is being tested with that.

Scott

By: Claudius Weinberger

Claudius Weinberger — Wed, 08 Jul 2015 19:22:00 +0000

In reply to scamo. The benchmark was never supposed to make any database look bad. I was quite surprised by the some discussion that ended in a flame war. I wanted to show that multi-model does not mean you have to sacrifice performance. There will be a lot of use-case where one database is much better than another. While in a different use-case it is the other way around. The goal was to show that ArangoDB is fast enough and we want to convince people with features like microservices, extensibility, easiness of use, flexibility.

By: scamo

scamo — Wed, 08 Jul 2015 16:07:00 +0000

I like the openness you gents at Arango are showing to make this as fair a benchmark as possible. I find that refreshing in a world often full of cutthroat enterprises.

Scott