Understanding the CAP Theorem
The theorem states that networked shared-data systems can only guarantee/strongly support two of the following three properties:
- Consistency - A guarantee that every node in a distributed cluster returns the same, most recent, successful write. Consistency refers to every client having the same view of the data. There are various types of consistency models. Consistency in CAP (used to prove the theorem) refers to linearizability or sequential consistency, a very strong form of consistency.
- Availability -A guarantee that every request receives a response about whether it was successful or failed. Whether you want to read or write you will get some response back.
- Partition Tolerant -The system continues to operate despite arbitrary message loss or failure of part of the system. Irrespective of communication cut down among the nodes, system still works.
Often CAP theorem is misunderstood. It is not any 2 out of 3. Key point here is P is not visible to your customer. It is Technology solution to enable C and A. Customer can only experience C and A.
P is driven by wires, electricity, software and hardware and none of us have any control and often P may not be met. If P is existing, there is no challenge with A and C (except for latency issues). The problem comes when P is not met. Now we have two choices to make.
The C and A in ACID represent different concepts than C and in A in the CAP theorem.
The CAP theorem categorizes systems into three categories:
- CP (Consistent and Partition Tolerant) - At first glance, the CP category is confusing, i.e., a system that is consistent and partition tolerant but never available. CP is referring to a category of systems where availability is sacrificed only in the case of a network partition.
- CA (Consistent and Available) - CA systems are consistent and available systems in the absence of any network partition. Often a single node's DB servers are categorized as CA systems. Single node DB servers do not need to deal with partition tolerance and are thus considered CA systems. The only hole in this theory is that single node DB systems are not a network of shared data systems and thus do not fall under the preview of CAP.
- AP (Available and Partition Tolerant) - These are systems that are available and partition tolerant but cannot guarantee consistency.