Comments on: Efficient Lock-Free Data Structure Protection | ArangoDB Blog https://arangodb.com/2015/08/lockfree-protection-of-data-structures-that-are-frequently-read/ The database for graph and beyond Tue, 25 Jun 2024 07:20:50 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: Max Neunhöffer https://arangodb.com/2015/08/lockfree-protection-of-data-structures-that-are-frequently-read/#comment-490 Fri, 14 Aug 2015 07:21:00 +0000 http://www.arangodb.com/?p=8442#comment-490 In reply to Brent Spell.

You are of course right. Unfortunately I did not yet see your post when I was making some changes last night, because it was still in moderation.

In the meantime I have changed the getMyId method to use compare_exchange_strong, which is OK, because that is only used in the rare case that a thread accesses any data protector instance for the first time, so it is not performance critical. Furthermore, the _last variable is now static, which makes more sense, as was pointed out by Josh above. The changes are in github and in the text of the article.

Thanks for your comment and your kind words.

]]>
By: Max Neunhöffer https://arangodb.com/2015/08/lockfree-protection-of-data-structures-that-are-frequently-read/#comment-489 Thu, 13 Aug 2015 20:54:00 +0000 http://www.arangodb.com/?p=8442#comment-489 In reply to Josh Haberman.

Thanks for your nice and thorough comments.

Indeed, this is RCU at work. Thanks for the references, I knew it is not new, but I did not know the name.

I think what is new in this implementation of RCU is that it is using C++11 standard tools only, that everything is nicely encapsulated in a single class, and that the usage is extremely simple: No preparation per thread is needed, nothing needs to be called on a regular basis from each thread and there is no limitation on the time a reader spends inspecting a version of the data.

This of course comes at a price, which is a pair of an atomic increment and an atomic decrement operation in usually uncontented memory. The measurements indicate that the costs for this are small in comparison to the read operations we usually do in ArangoDB, and small in comparison to a proper lock operation, at least on x86.

You are of course right, _last should be static such that threads using different DataProtector instances have a better chance to pick different slots. Thinking about it again, it occurs to me now that getMyId() is actually buggy, when multiple threads call it concurrently. I have improved the code in the github repository using a compare_exchange_strong call. As presented originally it could even lead to a buffer overrun!

One other advantage of this implementation is that correctness does not depend on the fact that different threads use different slots. Even if there are more threads than the compile time template parameter, proper protection is guaranteed, although there might be contention in some reference counters. In practice, I believe this problem can be neglected.

Therefore we propose this simple class as a practical implementation of RCU for read-world software.

]]>
By: Brent Spell https://arangodb.com/2015/08/lockfree-protection-of-data-structures-that-are-frequently-read/#comment-488 Thu, 13 Aug 2015 13:53:00 +0000 http://www.arangodb.com/?p=8442#comment-488 I believe your getMyId method has some problems. First, the id variable can be assigned to a value equal to Nr, which should result in a buffer overrun on the _list array.

Additionally, the test (_last > Nr) and assignment _last = 0 are not synchronized, so it is possible that several threads could be assigned the same ids, if _last were assigned to 0 by several threads in a concurrent schedule. This should not affect correctness, but it will cause some false sharing as multiple threads are assigned the same slot.

A simpler implementation without these problems would be to allow _last to increase without bound and assign slots mod Nr, something along the lines of the following.

int getMyId () {
if (_mySlot < 0)
_mySlot = _last++ % Nr;
return _mySlot;
}

]]>
By: Josh Haberman https://arangodb.com/2015/08/lockfree-protection-of-data-structures-that-are-frequently-read/#comment-487 Wed, 12 Aug 2015 23:15:00 +0000 http://www.arangodb.com/?p=8442#comment-487 I believe you may have re-invented RCU: https://en.wikipedia.org/wiki/Read-copy-update

This code follows the basic structure of RCU. The traditional RCU primitives correspond to DataProtector primitives like so:

– prot.use() is rcu_read_lock()
– prot.unUse() is rcu_read_unlock()
– prot.scan() is synchronize_rcu()

RCU was originally a kernel-side concept, but it has since expanded to user-space also, see:

http://liburcu.org/

https://lwn.net/Articles/573424/ (especially see the table about the tradoffs — fascinating!)

liburcu is clearly not as simple as what is described in this article. One reason is that they are using pre-C11 C, which has no standardized atomics or barriers. It is also much more aggressive about reducing overheads: the DataProtector code described here incurs a test plus a (possibly contended) atomic increment/decrement pair for every read critical section. liburcu offers several variants of the implementation, only one of which is close to this amount of read-side overhead.

I said “possibly contended” even though the code here endeavors not to have contention. But it appears it still can run into contention in the case where there are more threads than the compile-time template parameter of the DataProtector class. In that case the id space will wrap around and multiple threads will be assigned the same slot, leading to contention. The more you exceed this, the more contention you will get. This is an unfortunate drawback of this code’s simplicity.

Also, it seems like there is a bug in the code. _mySlot is thread-local, but _last is an instance variable. It seems like two DataProtectors (which will have independent _last values) could assign the same _mySlot to different threads. This would cause unnecessary contention even if you don’t exceed the thread limit. It seems like _last should be static (global).

I don’t mean this commentary to come off negatively. I think there is a *lot* of value in the way you have managed to factor this so that the DataProtector class is short and simple. Lots of lock-free algorithms have been known for a while, but in many cases they aren’t practical because they aren’t factored in such a way that they have convenient APIs. SMR/Hazard Pointers is a great example of this. I would love to see if it’s possible for this class to address these issues while still remaining simple.

]]>