Rendezvous with Riak CRDTs : Part 1
While working on Elixir/Erlang you are bound to chart in the territory of distributed systems, for me the the starting point was this talk by Ben Taylor on Riak Core wherein he showed how you can build a distributed stateful application using riak core (I have blogged about the same here).
For me the obvious next thing was to dive deep in to Riak KV, a distributed NoSql KV store. It was working on Riak KV that I stumbled upon CRDTs.
But before I dwell deep into CRDTs, we need to understand the problem that CRDTs solves.
One of the major benefit of having a NoSQL database is that it can scale pretty easily, these databases takes care of many thing like data replication across cluster, concurrent updates, data consistency, message redelivery, conflict resolution etc. However, all these things comes at a cost, for instance you may be able to scale up nodes in your database cluster, but you might have to compromise on consistency (which means that all nodes in your database may not return the same result) or there might substantial performance implications for resolving conflicts in data. Like this, there are many tradeoff that one has to make and not to mention the CAP theorem, which states that you can only chose only two out of consistency, availability and partition tolerance.
You might be wondering, where does CRDTs fits into all of this ? well, for starter CRTDs is a solution to C in CAP theorem where C stands for consistency.
Consistency: C in CAP
Usually when people say consistency, what they actually mean is Strong consistency. How can we define Strong consistency ? well if you have a series of updates, and every node in your cluster sees the updates in same order, it is called Strong consistency. But unfortunately, it is very hard to achieve when you a have a system which allows concurrent updates (which ideally you should be doing).
while Eventual consistency is when you allow replicas to diverge for a while on updated but they eventually coverages to the same value.
In order to allow concurrent updates we often have to give up on strong consistency. There are many flavour of consistency in any distributed systems that you can opt for, like strong consistency, eventual consistency, strong eventual consistency, optimistic consistency etc. Out of these flavours CRDTs allows you to have provable strong eventual consistency (key here is provable )this means that it guarantees a strong eventual consistency (SEC) and if you can settle for SEC, than you can have all three from CAP theorem.
Consistency and Replication Strategies:
Scalability is the biggest feature of any NoSQL database, which means that you usually have more that one node in DB cluster, As a result, if any update goes to one node, that update needs to be replicated to other nodes to maintain consistency of data.
To put it simple words, Data replication strategies can be grouped into two main categories.
- Synchronous
- Asynchronous
Synchronous Replication:
Synchronous data replication is straight forward, wherein if a update request come a one node, than we propagate the update to all of the other nodes and wait for acknowledgment from them, once it gets all acknowledgement, it reply back to the client saying write was successful. However this solution is not scalable.
This solution has the data replication in its critical path as it waits for data to be replicated across cluster before returning a response back to client.
Asynchronous Replication:
In this strategy, data replication is not in critical path which means that if a node receives a update request, it updates it’s local copy and immediately return a response to the client, In the background it tries to propagate the updates to other nodes.
Well, this sounds like a solution !!!!!!
Sorry to disappoint but this path is full of perils, In order to explain it better let me take Shopping cart as an example wherein user can add or remove item.
Problem 1: Order Independence
- Let’s say, user tries to add item A to the cart, this request goes to first node(let’s call it R1) of your DB cluster, after returning a response to User, it successfully propagates the updated to R2 (another replication Node in the same cluster).
- R2 receives the update and updates its local copy.
- User again adds item B to the cart, update again goes to R1 and R1 tries to propagates the same update to the R2, but this time R2 could not receive the update because of some network issues as a result R2 has only item A in its shopping cart.
- Unfortunately, User changes his mind and decides to remove the Item B from the cart. Again request goes to R1 which propagates it to R2 but now R2 is in dilemma, since R2 never added item B in its Cart, how is it supposed to remove it.
- Now R2 diverges and there is a conflict which must be resolved.
Problem 2: Duplication and Redelivery
- Let’s say we try to solve above problem by requiring every replication node to send an acknowledgement for each update so that we could attempt redelivery if any update fails to propagate.
- R1 tells R2 to add item A to the cart, and R2 after adding item A to the cart sends an acknowledgement to the R1, however this acknowledgement message is lost in the network.
- Since the acknowledgement is lost, R1 attempts redelivery of update (add(A)). R2 not knowing what is going on add again Item A to the cart, And Now R2 has two item A in its Cart.
- Now again R2 diverges and we have a conflict in our cluster.
Conflict Resolution : diverge → Rollback → Converge
Most of the distributed databases uses the divergence, rollback and converge cycle, which means that whenever a node diverges or has a conflict, system will rollback changes in the conflicting node and try the bring it in the same sate as other nodes.
In order to achieve this, nodes has to restore to something called as consensus algorithm (for example Paxos or Raft). In this algorithm each nodes talk to another node and tries to reconcile the conflict. For example if majority of nodes agrees that Item A was added only once, than all the nodes having two Item A, will rollback their changes and take changes that were decided by the consensus.
It might sound a fairly good solution except the fact the these consensus algorithm are notorious hard to implement. And Even if we get it right, there is huge performance penalties that need to paid for every conflict reconciliation because now for every conflict reconciliation each node needs to talk to another node to form a consensus on overall system state and this not scalable.
Enter CRDT:
According to Wikipedia:
In distributed computing, a conflict-free replicated data type (abbreviated CRDT) is a type of specially-designed data structure used to achieve strong eventual consistency (SEC) and monotonicity (absence of rollbacks) [1] . As their name indicates, a CRDT instance is distributed into several replicas; each replica can be mutated promptly and concurrently; the potential divergence between replicas is however guaranteed to be eventually reconciled through downstream synchronisation (off the critical path); [1] consequently CRDTs are known to be highly available.
Don’t worry if you don’t understand everything. Key words here are, Strong eventual consistency ,absence of rollback and synchronisation off the critical path.
This mean that if you are using CRDT data structures than you get following benefits out of the box.
- Strong Eventual consistency without Consensus or concurrency control which leads to high performing and scalable systems.
- Solves CAP theorem: if you accept SEC instead of Strong consistency than you can have all three.
- Any kind of update is allowed with no conflict because CRDTs guarantees that these update will eventually converge.
How do CRDTs achieve this awesomeness ?
CRDTs are specially designed data structure which follow some mathematical properties, which are
Commutative : Order Independence
Taking the example of shopping cart, this means that in CRDT following holds true:
add(item A) + add (item B) + remove (item B) = add(item A) + remove (item B) + add (item A)
this mean that even if a node revceives a delete operation before a add operation, it does not have have to resort to conflict reconciliation because of the this property of CRDT.
Idempotent : Immune to duplication and redelivery:
CRDTs can handle duplication and redelivery of same update, which means following holds true
add(item A) + add(item A) = add(A)
How does it work
- Whenever a node receives an update, it updates the local copy and propagate the same to other replicas.
- Any kind of update is allowed (for example deleting an item even it hasn’t been added) because we know that all of these updates will eventually converge though they may seem to be conflicting right now (remember !!! those mathematical properties).
I do not want to go into details of there implementation. But if you are interested you can read about them in this paper.
So, What’s the catch ? Why don’t every body uses them !!!
That is because of those magical mathamatical properties, not every operation or data structure can be modelled in this way. There are many CRDT data types but I’ll only talk about the types supported by Riak KV and these are:
- Counters : simple counter which can be decreased or increased.
- Flags : bolean flag
- Maps: Maps are the most versatile of the Riak data types because all other data types can be embedded within them, including maps themselves.
- Registers: just like string or a variable.
- Sets: Collection of unique value such as String.
In the Next post I will explain how you can model your domain models using Riak CRDTs and as bonus I will show how you can build a transformation library which will automatically convert your domain models to Riak CRDT types before you saves them to Riak KV.
Nice article