登录查看更多内容

Rendezvous with Riak CRDTs : Part 1

Naveen Negi

Senior Software Architect | Ex-Thoughtworks

发布日期: 2017年10月31日

While working on Elixir/Erlang you are bound to chart in the territory of distributed systems, for me the the starting point was this talk by Ben Taylor on Riak Core wherein he showed how you can build a distributed stateful application using riak core (I have blogged about the same here).

For me the obvious next thing was to dive deep in to Riak KV, a distributed NoSql KV store. It was working on Riak KV that I stumbled upon CRDTs.

But before I dwell deep into CRDTs, we need to understand the problem that CRDTs solves.

One of the major benefit of having a NoSQL database is that it can scale pretty easily, these databases takes care of many thing like data replication across cluster, concurrent updates, data consistency, message redelivery, conflict resolution etc. However, all these things comes at a cost, for instance you may be able to scale up nodes in your database cluster, but you might have to compromise on consistency (which means that all nodes in your database may not return the same result) or there might substantial performance implications for resolving conflicts in data. Like this, there are many tradeoff that one has to make and not to mention the CAP theorem, which states that you can only chose only two out of consistency, availability and partition tolerance.

You might be wondering, where does CRDTs fits into all of this ? well, for starter CRTDs is a solution to C in CAP theorem where C stands for consistency.

Consistency: C in CAP

Usually when people say consistency, what they actually mean is Strong consistency. How can we define Strong consistency ? well if you have a series of updates, and every node in your cluster sees the updates in same order, it is called Strong consistency. But unfortunately, it is very hard to achieve when you a have a system which allows concurrent updates (which ideally you should be doing).

while Eventual consistency is when you allow replicas to diverge for a while on updated but they eventually coverages to the same value.

In order to allow concurrent updates we often have to give up on strong consistency. There are many flavour of consistency in any distributed systems that you can opt for, like strong consistency, eventual consistency, strong eventual consistency, optimistic consistency etc. Out of these flavours CRDTs allows you to have provable strong eventual consistency (key here is provable )this means that it guarantees a strong eventual consistency (SEC) and if you can settle for SEC, than you can have all three from CAP theorem.

Consistency and Replication Strategies:

Scalability is the biggest feature of any NoSQL database, which means that you usually have more that one node in DB cluster, As a result, if any update goes to one node, that update needs to be replicated to other nodes to maintain consistency of data.

To put it simple words, Data replication strategies can be grouped into two main categories.

Synchronous
Asynchronous

Synchronous Replication:

Synchronous data replication is straight forward, wherein if a update request come a one node, than we propagate the update to all of the other nodes and wait for acknowledgment from them, once it gets all acknowledgement, it reply back to the client saying write was successful. However this solution is not scalable.

This solution has the data replication in its critical path as it waits for data to be replicated across cluster before returning a response back to client.

Asynchronous Replication:

In this strategy, data replication is not in critical path which means that if a node receives a update request, it updates it’s local copy and immediately return a response to the client, In the background it tries to propagate the updates to other nodes.

Well, this sounds like a solution !!!!!!

Sorry to disappoint but this path is full of perils, In order to explain it better let me take Shopping cart as an example wherein user can add or remove item.

Problem 1: Order Independence

Let’s say, user tries to add item A to the cart, this request goes to first node(let’s call it R1) of your DB cluster, after returning a response to User, it successfully propagates the updated to R2 (another replication Node in the same cluster).
R2 receives the update and updates its local copy.
User again adds item B to the cart, update again goes to R1 and R1 tries to propagates the same update to the R2, but this time R2 could not receive the update because of some network issues as a result R2 has only item A in its shopping cart.
Unfortunately, User changes his mind and decides to remove the Item B from the cart. Again request goes to R1 which propagates it to R2 but now R2 is in dilemma, since R2 never added item B in its Cart, how is it supposed to remove it.
Now R2 diverges and there is a conflict which must be resolved.

Problem 2: Duplication and Redelivery

Let’s say we try to solve above problem by requiring every replication node to send an acknowledgement for each update so that we could attempt redelivery if any update fails to propagate.
R1 tells R2 to add item A to the cart, and R2 after adding item A to the cart sends an acknowledgement to the R1, however this acknowledgement message is lost in the network.
Since the acknowledgement is lost, R1 attempts redelivery of update (add(A)). R2 not knowing what is going on add again Item A to the cart, And Now R2 has two item A in its Cart.
Now again R2 diverges and we have a conflict in our cluster.

Conflict Resolution : diverge → Rollback → Converge

Most of the distributed databases uses the divergence, rollback and converge cycle, which means that whenever a node diverges or has a conflict, system will rollback changes in the conflicting node and try the bring it in the same sate as other nodes.

In order to achieve this, nodes has to restore to something called as consensus algorithm (for example Paxos or Raft). In this algorithm each nodes talk to another node and tries to reconcile the conflict. For example if majority of nodes agrees that Item A was added only once, than all the nodes having two Item A, will rollback their changes and take changes that were decided by the consensus.

It might sound a fairly good solution except the fact the these consensus algorithm are notorious hard to implement. And Even if we get it right, there is huge performance penalties that need to paid for every conflict reconciliation because now for every conflict reconciliation each node needs to talk to another node to form a consensus on overall system state and this not scalable.

Enter CRDT:

According to Wikipedia:

In distributed computing, a conflict-free replicated data type (abbreviated CRDT) is a type of specially-designed data structure used to achieve strong eventual consistency (SEC) and monotonicity (absence of rollbacks) [1] . As their name indicates, a CRDT instance is distributed into several replicas; each replica can be mutated promptly and concurrently; the potential divergence between replicas is however guaranteed to be eventually reconciled through downstream synchronisation (off the critical path); [1] consequently CRDTs are known to be highly available.

Don’t worry if you don’t understand everything. Key words here are, Strong eventual consistency ,absence of rollback and synchronisation off the critical path.

This mean that if you are using CRDT data structures than you get following benefits out of the box.

Strong Eventual consistency without Consensus or concurrency control which leads to high performing and scalable systems.
Solves CAP theorem: if you accept SEC instead of Strong consistency than you can have all three.
Any kind of update is allowed with no conflict because CRDTs guarantees that these update will eventually converge.

How do CRDTs achieve this awesomeness ?

CRDTs are specially designed data structure which follow some mathematical properties, which are

Commutative : Order Independence

Taking the example of shopping cart, this means that in CRDT following holds true:

add(item A) + add (item B) + remove (item B) = add(item A) + remove (item B) + add (item A)

this mean that even if a node revceives a delete operation before a add operation, it does not have have to resort to conflict reconciliation because of the this property of CRDT.

Idempotent : Immune to duplication and redelivery:

CRDTs can handle duplication and redelivery of same update, which means following holds true

add(item A) + add(item A) = add(A)

How does it work

Whenever a node receives an update, it updates the local copy and propagate the same to other replicas.
Any kind of update is allowed (for example deleting an item even it hasn’t been added) because we know that all of these updates will eventually converge though they may seem to be conflicting right now (remember !!! those mathematical properties).

I do not want to go into details of there implementation. But if you are interested you can read about them in this paper.

So, What’s the catch ? Why don’t every body uses them !!!

That is because of those magical mathamatical properties, not every operation or data structure can be modelled in this way. There are many CRDT data types but I’ll only talk about the types supported by Riak KV and these are:

Counters : simple counter which can be decreased or increased.
Flags : bolean flag
Maps: Maps are the most versatile of the Riak data types because all other data types can be embedded within them, including maps themselves.
Registers: just like string or a variable.
Sets: Collection of unique value such as String.

In the Next post I will explain how you can model your domain models using Riak CRDTs and as bonus I will show how you can build a transformation library which will automatically convert your domain models to Riak CRDT types before you saves them to Riak KV.

Amit Chavan

7 年

Nice article

要查看或添加评论，请登录

Naveen Negi的更多文章

Embracing a Data-Driven Approach in Software Design

2023年9月8日

Embracing a Data-Driven Approach in Software Design

Where Object oriented design and DDD lacks Object oriented design (and by extension DDD) models the problem domain in…
3 Must-Read Books for Tech Leads Transitioning to Engineering Leadership

2023年9月3日

3 Must-Read Books for Tech Leads Transitioning to Engineering Leadership

Making the leap from a tech lead to an engineering leader is a significant career transition that requires a different…
Guiding with Intent and Empowering with Autonomy

2023年8月31日

Guiding with Intent and Empowering with Autonomy

Leadership is a delicate dance between steering the ship and empowering the crew to chart their course. An effective…
Navigating Software Complexity

2023年8月5日

Navigating Software Complexity

Introduction Software development is like sculpting in the digital realm, creating intricate structures and systems…
Embracing Diversity of Views: The Key to Consensus and Enriched Discussions

2023年8月4日

Embracing Diversity of Views: The Key to Consensus and Enriched Discussions

It takes multitude of view to make people realize they too have multitude. In our ever-evolving world, coming to an…
Tech leadership: Outcome-driven team lose in the long run

2021年6月28日

Tech leadership: Outcome-driven team lose in the long run

When people focus only on the outcome, they have no incentive to break from the norm. They would follow the same…
Taming complexity in Functional languages with Domain-driven design

2019年9月9日

Taming complexity in Functional languages with Domain-driven design

For quite some time I have been working in micro-services in Clojure. I feel Clojure works very well with…
First-time tech lead? Here is how you can take effective Architectural decisions

2019年6月8日

First-time tech lead? Here is how you can take effective Architectural decisions

Any software project has primarily two kinds of requirements, one is a set of the functional requirement (product…

1 条评论
Elixir: Domain driven design with Actor Model

2017年5月17日

Elixir: Domain driven design with Actor Model

Couple of months back, I was listening to Eric Evan’s pod cast on Software engineering radio where he mentioned that…

2 条评论
Riak_Core with Elixir : Part Three

2017年1月7日

Riak_Core with Elixir : Part Three

This is the third post in the riak_core series. Read part one here.

See all articles

Rendezvous with Riak CRDTs : Part 1

Naveen Negi

Senior Software Architect | Ex-Thoughtworks

Consistency: C in CAP

Consistency and Replication Strategies:

Synchronous Replication:

Asynchronous Replication:

Problem 1: Order Independence

Problem 2: Duplication and Redelivery

Conflict Resolution : diverge → Rollback → Converge

Enter CRDT:

Commutative : Order Independence

Naveen Negi的更多文章

社区洞察

其他会员也浏览了

State of the Graph: JSON-LD, GraphQL and Graph Serialization

Optimize GraphQL APIs with DataLoader: Eliminating the Dreaded N+1 Problem

Delta Lake, Iceberg & Hudi: A Transactional Perspective

Intro to the Iceberg Kafka Connect Sink

Choosing the number of partitions for a topic

Lakehouse Concurrency Controls: Are we too optimistic?

Bard, You Flip-Flopper

Record Level Indexing in Apache Hudi Delivers 70% Faster Point Lookups

Deep dive into Cloud databases- Transactions & ACID

Consistency: C in CAP

Consistency and Replication Strategies:

Synchronous Replication:

Asynchronous Replication:

Problem 1: Order Independence

Problem 2: Duplication and Redelivery

Conflict Resolution : diverge → Rollback → Converge

Enter CRDT:

Commutative : Order Independence

Naveen Negi的更多文章

Embracing a Data-Driven Approach in Software Design

3 Must-Read Books for Tech Leads Transitioning to Engineering Leadership

Guiding with Intent and Empowering with Autonomy

Navigating Software Complexity

Embracing Diversity of Views: The Key to Consensus and Enriched Discussions

Tech leadership: Outcome-driven team lose in the long run

Taming complexity in Functional languages with Domain-driven design

First-time tech lead? Here is how you can take effective Architectural decisions

Elixir: Domain driven design with Actor Model

Riak_Core with Elixir : Part Three

社区洞察

其他会员也浏览了

State of the Graph: JSON-LD, GraphQL and Graph Serialization

Optimize GraphQL APIs with DataLoader: Eliminating the Dreaded N+1 Problem

Delta Lake, Iceberg & Hudi: A Transactional Perspective

Intro to the Iceberg Kafka Connect Sink

Choosing the number of partitions for a topic

Lakehouse Concurrency Controls: Are we too optimistic?

Bard, You Flip-Flopper

Record Level Indexing in Apache Hudi Delivers 70% Faster Point Lookups

Deep dive into Cloud databases- Transactions & ACID