Two Phase Commit (2PC)

Two Phase Commit (2PC)

Two phase commit?(2PC) is a protocol used in many classic relational databases. For example, MySQL Cluster (not to be confused with the regular MySQL) provides synchronous replication using 2PC.?

In the first phase (voting), the coordinator sends the update to all the participants. Each participant processes the update and votes whether to commit or abort. When voting to commit, the participants store the update onto a temporary area (the write-ahead log). Until the second phase completes, the update is considered temporary.

In the second phase (decision), the coordinator decides the outcome and informs every participant about it. If all participants voted to commit, then the update is taken from the temporary area and made permanent.

Having a second phase in place before the commit is considered permanent is useful, because it allows the system to roll back an update when a node fails. In contrast, in primary/backup ("1PC"), there is no step for rolling back an operation that has failed on some nodes and succeeded on others, and hence the replicas could diverge.

2PC is prone to blocking, since a single node failure (participant or coordinator) blocks progress until the node has recovered. Recovery is often possible thanks to the second phase, during which other nodes are informed about the system state. Note that 2PC assumes that the data in stable storage at each node is never lost and that no node crashes forever. Data loss is still possible if the data in the stable storage is corrupted in a crash.

The details of the recovery procedures during node failures are quite complicated so I won't get into the specifics. The major tasks are ensuring that writes to disk are durable (e.g. flushed to disk rather than cached) and making sure that the right recovery decisions are made (e.g. learning the outcome of the round and then redoing or undoing an update locally).

But, it is not partition tolerant. The failure model that 2PC addresses does not include network partitions; the prescribed way to recover from a node failure is to wait until the network partition heals. There is no safe way to promote a new coordinator if one fails; rather a manual intervention is required. 2PC is also fairly latency-sensitive, since it is a write N-of-N approach in which writes cannot proceed until the slowest node acknowledges them.

2PC strikes a decent balance between performance and fault tolerance, which is why it has been popular in relational databases. However, newer systems often use a partition tolerant consensus algorithm, since such an algorithm can provide automatic recovery from temporary network partitions as well as more graceful handling of increased between-node latency.

I hope you found the article useful.

Happy Coding :)

要查看或添加评论,请登录

Umang Agarwal的更多文章

  • Push and Pull Configuration Management Tools

    Push and Pull Configuration Management Tools

    Push and pull configuration management tools are software solutions that facilitate the management and distribution of…

    10 条评论
  • What is Configuration Management in DevOps?

    What is Configuration Management in DevOps?

    Configuration management in DevOps refers to the process of managing and controlling the configuration of software…

    3 条评论
  • Principles of Web Distributed Systems Design

    Principles of Web Distributed Systems Design

    Like most things in life, taking the time to plan ahead when building a web service can help in the long run;…

  • What is non-monotonicity good for?

    What is non-monotonicity good for?

    The difference between monotonicity and non-monotonicity is interesting. For example, adding two numbers is monotonic…

  • Lamport Clocks

    Lamport Clocks

    Assuming that we cannot achieve accurate clock synchronization - or starting with the goal that our system should not…

  • Does time progress at the same rate everywhere?

    Does time progress at the same rate everywhere?

    We all have an intuitive concept of time based on our own experience as individuals. Unfortunately, that intuitive…

  • Partition and Replication

    Partition and Replication

    The manner in which a data set is distributed between multiple nodes is very important. In order for any computation to…

  • Forward Proxy vs Reverse Proxy Servers

    Forward Proxy vs Reverse Proxy Servers

    Forward Proxy A forward proxy is an intermediary that sits between one or more user devices and the internet. Instead…

  • Remote Procedure Calls (RPC)

    Remote Procedure Calls (RPC)

    In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to…

  • Types of NoSQL databases

    Types of NoSQL databases

    NoSQL is a collection of data items represented in a key-value store, document store, wide column store, or a graph…

社区洞察

其他会员也浏览了