Exploring Data Replication in Depth
Bala Subramanian
Engineering Manager | Solution Architect | System Design | Top Systems Design Voice | CTO | Co Founder
Replication means making and keeping copies of data on multiple computers connected by a network. There are several reasons to do this:
Replication becomes complex when data changes frequently. There are different methods to update data across computers: single-leader, multi-leader, and leaderless replication.
We will discuss each method in detail, including their pros and cons.
Single Leader Replication
Each node with a copy of the database is called a replica. Every change to the database must be applied to all replicas; otherwise, some replicas will have stale data.
The most common solution is leader-based replication (also known as active/passive or master-slave replication).
Leader-Based (Master-Slave) Replication
Here's how it works:
Synchronous vs Asynchronous Replication
Synchronous Replication:
Asynchronous Replication:
Hybrid Approach:
Some systems use a mix, called semi-synchronous replication:
This approach balances the benefits and drawbacks of both methods.
Handling New Node Addition
When adding a new replica:
A better approach:
Dealing with Node Outages
Nodes can fail due to hardware issues, network problems, or planned maintenance. The system should remain available despite node failures.
Follower Failure:
Leader Failure:
This requires "failover":
Replication Log Implementation
领英推荐
- The leader records the exact SQL statements (INSERT, UPDATE, DELETE) that modify data.
- These statements are sent to followers, who execute them in the same order.
- Problems arise with functions like NOW(), RAND(), or AUTO_INCREMENT, as they may produce different results on each replica.
- Some statements might have side effects or depend on the database's current state, making them unreliable for replication.
2. Write Ahead Log (WAL):
- This is a low-level log of changes to database pages.
- It records exact byte changes in specific disk blocks.
- Highly efficient for the database engine but very version-specific.
- Makes it difficult to replicate between different database versions or systems.
- Replication process is closely tied to the internal workings of the storage engine.
3. Logical (row-based) log replication:
- Records changes at a higher level of abstraction.
- Logs might include: "Insert row in table X with these column values" or "Update row in table Y, set column A to value V where primary key is K".
- More flexible than WAL, allowing replication between different database versions or even different database systems.
- Easier for external applications to parse, useful for data warehousing or custom replication solutions.
Multi-Leader Replication
Handling Write Conflicts:
Conflict example: User A changes a customer's email in datacenter 1, while User B changes the same customer's phone number in datacenter 2.
Conflict resolution strategies:
Single Leader vs Multi-Leader:
Leaderless Replication
Each replication strategy balances consistency, availability, and partition tolerance differently, as described by the CAP theorem. The best choice depends on your system's specific needs.
Handling Node Outages in Leaderless Systems
In leaderless systems, there's no failover process. For example:
To ensure data consistency and availability:
Quorum Writes and Reads:
This approach allows:
Version Vectors:
we've explored data replication, including different configurations, how they work, potential issues, and solutions. The choice of replication strategy should be based on your specific system requirements and trade-offs between consistency, availability, and partition tolerance.