CAP Theorem
Abstract:The CAP theorem is the idea that a distributed computing system is not able to provide partition tolerance, consistency and availability at the same time.
The CAP theorem first appeared in 1998 according to According to University of California, Berkeley computer scientist Eric Brewer.
Firstly, we provided overview on Distributed processing which will help to understand CAP theorem, Finally, we discussed CAP theorem where we covered consistency, availability and partition tolerance with proper example.
Keywords:Distributed Processing, Consistency, Availability, Partition Tolerance.
Introduction:Distributed processing, by its definition, is a setup in which multiple individual computer (or processor) run an application to provide more capability. In parallel processing multiple computer works for a single application.
First off all CAP in cap theorem stands for consistency, availability and partition tolerance. This are the attributes of the distributed system that is made up of multiple machines(computer) or nodes all communicating one another over network.
The consistency promises that if I write something in one node when read from another node it will return exactly what I write.
The availability promises that when I talk to one node it will must response unless node has failed. Availability allows failed nodes but if its not failed it will response.
Partition tolerance means that when the network is portioned then whatever other promises have made about the system it will still keep those promises. A network is portioned when messages can’t follow one machine to another. It might happen I you have two different data centers and the wide area connection in between two is severed. It also might happen if you consider your laptop to be the part of network one of the nodes then you undock it so the laptop still on it just can’t communicate with the rest of the network.
So CAP theorem says that we can only have at most two of these things we can’t have all three.
Distributed Processing:Distributed systems are groups of networked machine, which have the same goal for their work. In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors.
Let’s take a look into this image,
A Distributed System
P1, P2, P3 and P4 can communicate each other. Each of those have their own memory and processor and they share their memory and processor to provide more capability to run an application.
Consistency:If I write something in one node when read from another node it will return exactly what I write. To give these some real-world meaning let’s use a simple example, you want to buy a copy of book from an online bookstore. You are visiting this site and you found a book name “A Brief History of Time” and you added it to your favourite list. There is another person who also want to buy this book and he added it in his favourite list but there is only one copy of this book. If both customers can continue through the order process to the end (i.e. make payment) the lack of consistency between what’s in stock and what’s in the system will cause an issue. So consistency is important in this types of system.
Availability: when I talk to one node it will must response unless node has failed. Let’s take an example, you just woke up middle of the night and you feel hungry. You order food in online they called you and confirm you that some reason they can’t bring your food now. You are upset and opened your freezer and you have eaten whatever has in it. Its called availability.
Partition Tolerance:Partition tolerance means that when the network is portioned then whatever other promises have made about the system it will still keep those promises. A network is portioned when messages can’t follow one machine to another. Let’s take another example, You connected your phone with laptop by cable to transfer a movie after transferred some data some how you connection was lost this situation is called partition tolerance.
CAP Theorem: The CAP theorem is the idea that a distributed computing system is not able to provide partition tolerance, consistency and availability at the same time. Let’s discuss its in details, Suppose I have got a trivial distributed system just two nodes if I were write to one node and read from another node let suppose message can’t get from one node to other what would happen?
Basically three things that could happen, one first node could return best version it has which would be older than the one I wrote. If I doing that then I am not consistent.
Jon can read “Hi” from N2 because hi is available in N2 for partition tolerance “How?” can’t reach to N2. So message is not consistent. So this system can only provide Availability and Partition tolerance.
Second, if we don’t want to return older version and we want to make sure that we get what the user just wrote there for being consistent then we can wait but if no messages are getting through there is no way for that new version to get there. Weather second node is trying to read from first node or second node trying to write into first node no matter what algorithm we choose we can’t get that new version of message in second node. In that case we are not available.
In this case it is not Available but this system provides Consistence and Partition tolerance
Third, thing that can happen is of course messages actually get through so now I can be consistent and available but not partition tolerance.
Conclusion: In a network subject to communication failures, it is impossible for any web service to implement an atomic read/write shared memory that guarantees a response to every request .