登录查看更多内容

Mastering Replication in Distributed Systems: Part 2 – Multi-Leader and Leaderless Models

HARSHA BALAKUMAR

Associate Director @ Tavant | Engineering leadership | Product management | Program management

发布日期: 2025年2月11日

In the first part, we explored the benefits and trade-offs of replication in distributed systems, focusing on single-leader replication. While simple to implement, single-leader replication has limitations in certain scenarios. This is where other replication models—multi-leader and leaderless replication—come into play.

Multi-Leader Replication

In single-leader replication, all writes are directed to a single leader node, while reads can be served by either the leader or its followers. Multi-leader replication, on the other hand, allows writes to multiple nodes (leaders), which can be beneficial in scenarios where reducing latency or improving write availability is critical. Typically, these leaders synchronize asynchronously with other leaders and followers. This model offers improved failover mechanisms, as writes can continue even if one leader fails until a new leader is promoted.

Use cases similar to Multi-Leader Replication

Applications in devices with Intermittent Connectivity: Multi-leader replication concepts can be applied in a useful manner, when an application must continue functioning while disconnected from the internet. This setup resembles multi-leader replication across data centers, except each device functions as an independent data center with unreliable network connectivity.
Real-Time Collaborative Editing: Applications like Google Docs allow multiple users to edit a document simultaneously. Changes made by a user are first applied to their local replica before being asynchronously replicated to the server and other active users. If strict consistency is required, a document lock can be enforced, ensuring that only one user can edit at a time. This resembles single-leader replication. If real-time collaboration is prioritized, granular changes (e.g., keystrokes) are replicated without locks, introducing the complexities of multi-leader replication, including the need for conflict resolution.

Handling Write Conflicts

Multi-leader replication introduces the challenge of concurrent modifications to the same data in different locations. Conflict resolution strategies include:

Avoiding Conflicts: Routing all requests from a particular user to the same data center ensures single-leader behavior for that user.
Last Write Wins (LWW): The most recent write overwrites previous ones, ensuring convergence but at the cost of potential data loss.
Application-Specific Conflict Resolution: Many multi-leader systems allow custom conflict resolution logic, executed either at write time or read time.

While powerful, multi-leader replication often comes with subtle configuration challenges. Features like auto-incrementing keys, triggers, and integrity constraints may introduce inconsistencies.

Leaderless Replication

Both single-leader and multi-leader replication rely on at least one node acting as a leader to enforce write ordering. Leaderless replication removes this constraint, allowing any replica to accept writes directly. This model is commonly used in NoSQL databases like Amazon DynamoDB and Cassandra.

How Leaderless Replication Works

Clients can send write requests to multiple replicas directly or through a coordinator node.
Unlike leader-based models, no single node dictates write order.
Reads are also distributed across multiple nodes in parallel.

领英推荐

Storage and Data Protection News for the Week of April…

Backup, Recovery & Storage Solutions Review 10 个月前

Exploring Database Replication: Mechanisms and Benefits

??Database Design SQL??Development MySQL ??Data Analyst ??Business Intelligence 1 年前

What is the Best Storage Filesystem and Why?

IT Infra Insights 2 个月前

Ensuring Consistency with Quorums

Since leaderless replication lacks a designated leader to enforce consistency, quorum-based techniques are used:

Write Quorum (w): The minimum number of replicas that must acknowledge a write for it to be considered successful.
Read Quorum (r): The minimum number of replicas that must be queried for a read request.
Replication Factor (n): The total number of replicas storing the data.

To ensure consistency, the system follows the rule: w + r > n. This ensures at least one replica in a read operation has the latest data. For example:

If n = 3, setting w = 2 and r = 2 allows the system to tolerate one node failure.
If n = 5, setting w = 3 and r = 3 enables tolerance for two failed nodes.

Challenges of Leaderless Replication

Leaderless replication offers high availability and low latency but introduces challenges like stale reads. A network partition can separate a client from most database nodes, preventing it from reaching quorum.

Sloppy Quorums and Hinted Handoff

To mitigate these issues, sloppy quorums allow writes and reads to be acknowledged by any available nodes, not just the designated replicas. Once network connectivity is restored, temporarily stored writes are transferred to their intended replicas through a process called hinted handoff. However, sloppy quorums weaken consistency guarantees, as reads may return outdated values until replication completes.

Comparing Replication Models

Each replication model has distinct advantages and trade-offs:

Single-Leader Replication: Simple to implement with strong consistency but has a single point of failure.
Multi-Leader Replication: Increases write availability and failover resilience but introduces conflict resolution complexities.
Leaderless Replication: Maximizes availability and fault tolerance but requires quorum-based consistency mechanisms.

Each approach suits different use cases, and choosing the right replication strategy depends on factors like consistency requirements, failure tolerance, and network latency.

Replication plays a crucial role in distributed systems, ensuring data availability, fault tolerance, and performance optimization. While single-leader replication is straightforward, multi-leader and leaderless replication models address more complex requirements at the cost of increased system design complexity. Understanding these trade-offs is key to building resilient, scalable distributed systems.

要查看或添加评论，请登录

HARSHA BALAKUMAR的更多文章

Sharding 101: Everything You Need to Know About Partitioning Data

2025年2月26日

Sharding 101: Everything You Need to Know About Partitioning Data

In my previous articles on replication (part 1 and part 2), we explored how it enhances scalability. However, for large…
Mastering Replication in Distributed Systems: Part 1 – Single-Leader Model

2025年2月3日

Mastering Replication in Distributed Systems: Part 1 – Single-Leader Model

At its core, replication distributes data across many nodes, allowing systems to handle more queries efficiently…
High-Stakes Conversations Made Easy: Tools for Success

2025年1月27日

High-Stakes Conversations Made Easy: Tools for Success

Mastering Crucial Conversations: A Key to Personal and Professional Success In our fast-paced world, effective…
Navigating the Complexities of Modern Software Design

2025年1月20日

Navigating the Complexities of Modern Software Design

In today’s rapidly evolving technological landscape, designing robust and scalable software systems is more challenging…

1 条评论
Effective management of technical debt - the Google story

2024年7月12日

Effective management of technical debt - the Google story

Building a dependable, maintainable, and scalable product is simple when there are no limits. Software development…

1 条评论

See all articles

Mastering Replication in Distributed Systems: Part 2 – Multi-Leader and Leaderless Models

HARSHA BALAKUMAR

Associate Director @ Tavant | Engineering leadership | Product management | Program management

领英推荐

HARSHA BALAKUMAR的更多文章

社区洞察

其他会员也浏览了

What is the Best Storage Filesystem and Why?

Denodo Platform Cache

Understanding Federated Architecture: A Modern Data Strategy

Embracing the Future: Unlocking Efficiency and Security with Autonomous Databases

DATA REPLICATION

Single Leader Replication algorithm - Discussing Replication schemes and Fault Tolerance

Multi-Leader Replication | Introduction and possible Use Cases

Revolutionising Database Management: How Autonomous Databases are Changing the Game

Splunk > Data replication issues

领英推荐

HARSHA BALAKUMAR的更多文章

Sharding 101: Everything You Need to Know About Partitioning Data

Mastering Replication in Distributed Systems: Part 1 – Single-Leader Model

High-Stakes Conversations Made Easy: Tools for Success

Navigating the Complexities of Modern Software Design

Effective management of technical debt - the Google story

社区洞察

其他会员也浏览了

What is the Best Storage Filesystem and Why?

Denodo Platform Cache

Understanding Federated Architecture: A Modern Data Strategy

Embracing the Future: Unlocking Efficiency and Security with Autonomous Databases

DATA REPLICATION

Single Leader Replication algorithm - Discussing Replication schemes and Fault Tolerance

Multi-Leader Replication | Introduction and possible Use Cases

Revolutionising Database Management: How Autonomous Databases are Changing the Game

Splunk > Data replication issues