Comments on: CAP, Google Spanner, & Survival: Eventual Consistency | ArangoDB

By: Matt Tagg

Matt Tagg — Sun, 28 Sep 2014 08:45:00 +0000

a bit off topic.. though any thoughts on CockroachDB? It’s a new open source Spanner project

http://www.wired.com/2014/07/cockroachdb/

By: martin Schönert

martin Schönert — Mon, 10 Feb 2014 14:48:00 +0000

I have read your white paper with great interest. Here is my understanding.

If we have a distributed system (like the example in my blog post), there is a partition (in the sense of a network/communication failure, not just the failure of machines/nodes) , and we want to keep consistency, then some clients will not be able to execute some of their requests.

There is no disagreement about that. You yourself write in your white paper: “… For any client communicating only with A, the database is down. …”.

There is also no disagreement that other clients can still execute their requests. You write: “… For these clients, the database will remain available for reads and writes…”.

So there is no different opinion about the situation: during a partition some clients cannot execute all their requests while others may remain able to.

The only difference is how to call this situation. In other articles about the CAP theorem this is called “not available” or “restricted availability”. You call it “overall database availability” or “system availability”.

One could argue that this is mostly a matter of “is the glass half empty or half full?”.

And I would agree that calling the system “not available” in this situation is problematic, because it might cause people to believe that the CAP theorem says “if anything goes wrong (server dying, network problem, …), then no client can execute any request any more”. And this is clearly not true.

But I would also argue that your formulation is not unproblematic, because it might cause people to believe that with the right database “if anything goes wrong, then the system will remain available to all clients and still remain consistent”.

The first sentence from you white paper “A database can provide strong consistency and system availability during network partitions.” certainly seems to imply this (until one realizes later on that the “system availability” is given as long as some clients can execute their requests).

Furthermore your white paper and blog article do not make it clear that there are applications for which “restricted availability” or “system availability” in case of a partition is not the optimal solution. This is the main point I was trying to make in my blog post.

Of course such applications, that want all clients to remain able to execute all their requests in all cases (even when a partition happens), must sometimes allow the system to become inconsistent. This is the consequence of the CAP theorem. And they must fix the inconsistencies when the network partition is over. And this is where “eventual consistency” comes in handy, because it means that the database will basically heal itself.

Of course, all this is not just binary “a database is either consistent or remains available”. There are many possible designs with many different trade-offs how to deal with this situation. For more in depth information about those the following articles are a first step.

Werner Vogels discusses (among many other things) the various shades of “consistency”: “Amazon Dynamo” http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html.

Daniel Abadi discusses (again among other things) the correlations between the behaviour of the database in the case of a partition and the normal operation: “Problems with CAP” http://dbmsmusings.blogspot.de/2010/04/problems-with-cap-and-yahoos-little.html

And Eric Brewer addresses a huge lot of issues in his tour-de-force: “CAP twelve years later” http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed

Best regards,

martin

By: martin Schönert

martin Schönert — Thu, 06 Feb 2014 09:54:00 +0000

Hi Nick,

Thank you for your quick response. I will definitely read your white paper and post here all new insights. I must ask you for a little patience because I have two days packed with workshops ahead of me.

I am looking forward to meet you in Cologne during the NoSQL matters.

Best,

martin

By: Nick Lavezzo

Nick Lavezzo — Wed, 05 Feb 2014 22:32:00 +0000

Hi Martin,

We actually have a white paper on our website that goes into what we believe to be the misunderstanding and the difference between CAP Availability and overall database system availability In the more commonly understood sense (the system can still serve read / write requests). Here’s the link: foundationdb.com/white-papers/the-cap-theorem

You can also see how FoundationDB stays available (in the non-CAP sense) during a live partition in our fault tolerance demo video: foundationdb.com/blog/foundationdb-fault-tolerance-demo-video

Hopefully this sheds better light on our position on CAP. If you were to update your blog so our position is accurately reflected, that would be super cool 🙂

Side note – I may be in Cologne for NoSQL Matters in April – it would be good to get together if you are around.

Best,

Nick Lavezzo

By: martin Schönert

martin Schönert — Wed, 05 Feb 2014 17:18:00 +0000

Hi Dave,

thank you for your comment.

Yes I am afraid I did not read the article quite the way you intended.

I did not understand that you want to make a distinction between “Availibility” in the CAP sense and “High Availibilty” in the fault-tolerance sense. And probably the main reason that I did not understand this is the fact that I don’t know how one would define “High Availibity” in the fault-tolerance sense.

Maybe you can help me and give me your definition. What are the faults that may happen? What do the clients (resp. some of the clients) experience when those faults happen?

Best,

martin

By: FoundationDB

FoundationDB — Wed, 05 Feb 2014 16:55:00 +0000

Hey Martin, thanks for the reply.

“He argues that Google Spanner “demonstrates the falsity of a trade-off between strong consistency and high availability”. In this article I show that Google Spanner does not disprove CAP…”

I’m not sure I wrote the article clearly enough, or maybe that you read it quite the way I intended. I think we are all on the same page–Neither Spanner nor FoundationDB disprove CAP! Both choose consistency.

In the article I am trying to discriminate between “Availability” in the CAP sense and “high availability” in the fault-tolerance sense. They are quite different and seem to confuse a lot of people new to this subject. Many people think that a consistent, transactional, distributed database cannot be fault-tolerant. That is false and that falsity *is* demonstrated by Spanner.

Thanks for the article. Great points about Google’s ability to among the most likely companies to opt for the CP side of equation because of their technical and financial resources.

Best,
Dave Rosenthal