Distributed Snapshots
Pratik Pandey
Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com
Through my last series about?Clocks ?&?Version Vectors , the idea was to discuss problems establishing order/causality in a distributed system and ways to achieve the same. One problem in the real world, that needs both concepts is Distributed Snapshots!
Problem Statement
Point in Time Snapshots is critical for capturing the “consistent” state of systems, which can be restored in case of any loss of system state, making your system fault tolerant.
Taking a snapshot of one particular server is easy. You define a cut-off time and at that time, the state of the server(local state) at that exact time can be captured for the snapshot.
However, Snapshot in distributed systems, i.e on all the nodes in a cluster, is a challenging problem because nodes in a cluster don’t have a common/global clock . Hence, it’s cannot be guaranteed that all the nodes in the cluster will capture their local state at the same “instant”.
In addition to the “local” state, there could be additional states associated with the distributed system, which are?in transit?i.e messages send from?node-1?to?node-2 but haven’t arrived at?node-2?yet.
The other?constraint?during snapshots is that it should not be a “stop the world” process and it should not alter the actual computations!!
In short, we need the distributed snapshot to create a “Consistent” snapshot of the?global state?of the distributed system,?without impacting actual computations?on that system.
The Precursor to Solution
The algorithm used for capturing distributed snapshots is the?Chandy-Lamport algorithm (Yes! Leslie Lamport is also behind?Lamport Clocks ).
I’m writing the precursor to summarize the aspects of the paper which form the base for the actual solution. I’d highly recommend you read the paper.
Since we cannot establish a “global time”, we will communicate across nodes to help establish a common time. We can hence define our distributed system as a?set of processes(nodes) and channels, which can in turn be represented using a?directed graph.
Assumptions:
Model of the Distributed System:
An?Event e?can occur in process?P?and can change the state of?P?itself, and at most one channel in the system connected to?P. Hence, an event?e?could be represented by the tuple <P, s, s`, c, M> where -
P — Process
s — state of P before sending the event
s`- state of P after sending the event
c — channel who’s state is altered by the event.
M — the message passed on the channel
A global state of the system is composed of the set of processes and channel states in the system.?The initial global state is one where the processes are in their initial states, while the channels are empty since no messages have been exchanged.
Let’s take a simple example to understand how snapshots would work and where things could go wrong. We have two processes P & Q, with two uni-directional channels C1 & C2. A message/token M is present with P during the initial state and M can be passed around.
If you see the above system, only one instance of the message M is supposed to be present during any “global state” snapshot of the system. Let’s consider different scenarios -
From the above scenarios, you can see that the time at which the snapshot is taken across different components in the system is independent & can lead to inconsistencies in the snapshot!
Solution(Algorithm)
The solution relies on leveraging a Marker event, which is sent along the communication channels with actual messages. How we use the Marker to ensure we make consistent snapshot is the crux of the algorithm.
We can split the algorithm into different into multiple phases -
领英推荐
2.?Reception(Receiving Marker)
If a process Pj is receiving a marker on an incoming channel, we can have two scenarios. Let’s talk about the case where Pj receives its first Marker event.
In case Pj has already received a Marker event before -
3.?Termination
The entire process terminates when -
i) All processes have received a Marker(this implies that all processes have recorded their local state).
ii) All the channels have had a Marker event processed through them.
I’d highly recommend you try and follow the algorithm for a simple 3-node system. Attached is a sample 3 node system & how it would work -
Let’s try & understand what’s happening in the above diagram. Consider it as a fully connected directed graph, where P1, P2 & P3 are the 3 nodes, and C12, C23, C13, C31, C32 &C21 are the 6 directed edges.
In the above diagram, the states recorded are -
S1: [A, B]
S2: [C]
S3: [D, E]
C21: [F -> G]
C12, C13, C32, C31, C23: Emp
The combination of these states gives us a consistent global state. As you can see, the “global snapshot” is a combination of snapshots from processes as well as channels. Try out your own variation of snapshots to create a consistent global snapshot!
References/Good Reads:
This brings us to the end of this article. We covered the challenges of capturing a distributed snapshot and then looked into the Chandy-Lamport Algorithm and how it works! Please post comments on any doubts you might have and will be happy to discuss them.
--------------------------------------------------------------------------------------------------------------
Thank you for reading! I’ll be posting weekly content on distributed systems & patterns, so please like, share and subscribe to this?newsletter ?for notifications of new posts.
Please comment on the post with your feedback, will help me improve! :)
Until next time, Keep asking questions & Keep learning!
Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com
2 年Received feedback to make things clearer for what's happening in the diagram. Have updated the blog with more details. Please check it out! Thank you Surinder Kumar Mehra for the feedback!
Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com
2 年Please subscribe to my newsletter -?https://www.dhirubhai.net/newsletters/system-design-patterns-6937319059256397824/ Also, you can connect with me/book mock interviews with me on Topmate -?https://topmate.io/pratik_pandey You can follow me on Medium -?https://distributedsystemsmadeeasy.medium.com/subscribe