Distributed Systems - Replication
Replication means keeping a copy of the same data on multiple machines that are connected via a network.
Reasons for doing replication
- To keep data geographically close to your users for reducing latency
- To allow the system to continue working even if some of its parts have failed for high availability
- To scale out the number of machines that can serve read queries for increasing read throughput
How Leader and followers systems work ?
- One of the node will be designated as leader and others as followers
- For writing clients must send write requests to leader only
- For reading clients can send read requests to either leader or followers
Types of replication
- Asynchronization - the leader does not wait until it gets confirmation from all the followers that it have received the write
- Synchronization - the leader waits for confirmation from all the followers that it have received the write before reporting to users and other clients
Replication high availability tip
The advantage of synchronous replication is that the follower is guaranteed to have data up to date with leader. If suddenly leader goes down, we can be sure that the data is still available on the follower. For that reason, it is impractical for all followers to be synchronous because any one node outage would cause the whole system to grind to a halt so you must enable synchronous replication on one of the followers and the others are asynchronous. This type of configuration is called semi-synchronous.
Steps Setting of new followers
From time to time, you need to set up new followers—perhaps to increase the number of replicas, or to replace failed nodes.
- Take a consistent snapshot of the leader’s database at some point in time if possible, without taking a lock on the entire database.
- Copy the snapshot to the new follower node.
- The follower connects to the leader and requests all the data changes that have happened since the snapshot was taken.
- When the follower has processed the backlog of data changes since the snapshot, we say it has caught up. It can now continue to process data changes from the leader as they happen.
Note : In some system like elastic search this process is completely automated
Node outages
It can occur because of the following reasons and it can be both leader or follower.
- Fault
- Planned maintenance
Catch up recovery for follower
- On its local disk, each follower should keep a log of the data changes it has received from the leader.
- If follower goes down, on restart it should read log and sync up with leader easily.
Failover for leader
Leader failover is the real challenge and problem in production because of the following
- System needs to elect next leader, which is quite expensive process
- System can't able to serve for write requests until it elects next leader
Failover can be manual or automatic and has the following process
- Determine that the leader has failed.
- Choose a next leader
- Reconfiguring the system to use the new leader.
Problems with leader failover
- If asynchronous replication is used, the new leader may not have received all the writes from the old leader before it failed.
- Discarding writes is especially dangerous if other storage systems outside of the database need to be coordinated with the database contents.
- In certain fault scenarios, it could happen that two nodes both believe that they are the leader. This situation is called split brain, and it is dangerous because will result in data corrupt.
- What is the right timeout before the leader is declared dead? A longer timeout means a longer time to recovery in the case where the leader fails
Problems in replication
When insert or update action happens in master it will take some time for replicating the case across all node, typically it will be less than seconds but some time lags can occur for more than minutes because of downtime and network problems in networks and If any user tries to read data from any node that is not in sync with master, they won't be able to see that data is called replication lag.
- Reading your own writes - A user makes a write, followed by a read from a stale replica. To prevent this anomaly, we need read-after-write consistency.
Solution – Always read latest updated values from leader with centralized meta data
2. Monotonic reads - A user first reads from a fresh replica, then from a stale replica. Time appears to go backward. To prevent this anomaly, we need monotonic reads.
Solution - One way of achieving monotonic reads is to make sure that each user always makes their reads from the same replica (different users can read from different replicas). For example, the replica can be chosen based on a hash of the user ID, rather than randomly. However, if that replica fails, the user’s queries will need to be rerouted to another replica.
3. Consistent Prefix reads - If some partitions are replicated slower than others, an observer may see the answer before they see the question.
Solution - One solution is to make sure that any writes that are causally related to each other are written to the same partition—but in some applications that cannot be done efficiently.