登录查看更多内容

Mastering Data Distribution with Consistent Hashing

Jayaraj Singh

SDE 1 Backend Developer @ InterviewVector

发布日期: 2024年7月6日

Building scalable and fault-tolerant systems requires efficiently distributing data across multiple servers. Traditional hashing methods like key % n often disrupt mappings when scaling, leading to significant maintenance challenges. This is where consistent hashing shines.

The Advantages of Consistent Hashing

- Minimised Reorganisation: Only a small fraction of keys need to be remapped when nodes are added or removed, unlike traditional hashing.

- Enhanced Load Balancing: Virtual replicas ensure even data distribution, preventing some servers from becoming overloaded.

How Consistent Hashing Operates:

1. Server Hashing: Servers are hashed to points on a circular ring.

2. Key-to-Server Mapping: Keys are hashed and assigned to the nearest server in a clockwise direction.

3. Dynamic Scaling: Adding or removing servers only requires reassignment of keys near the affected area.

4. Virtual Replicas for Load Balancing: Servers are mapped to multiple points on the ring for better distribution.

Detailed Example:

Scenario 1: Adding a New Server

Imagine three servers, A, B, and C, hashed to points on a ring from 0 to 256. The keys are distributed as follows:

- Server A handles keys from 0 to 85.

- Server B handles keys from 86 to 170.

- Server C handles keys from 171 to 255.

Now, we add a new server, D, which hashes to point 128. The new distribution would be:

- Server A: 0 to 85 (unchanged)

- Server B: 86 to 127

- Server D: 128 to 170 (takes over some key space from B)

- Server C: 171 to 255 (unchanged)

领英推荐

Storage Decision Guide for Microsoft 365 Backups

Veeam Software 2 年前

How to install Veeam Backup & Replication

Veeam Software 2 年前

Recover Deleted files from Azure virtual machine…

Ehtisham Ali 6 个月前

In this scenario, only the keys from Server B that fall between 128 and 170 need to be reassigned to Server D. The rest of the keys remain unaffected, showcasing the efficiency of consistent hashing in minimizing disruption.

Scenario 2: Removing a Server

Consider removing Server A, which handles keys from 0 to 85. The remaining servers, B and C, will need to take over the key space of A:

- Server B: 0 to 127 (expands to cover from 0 to 85)

- Server C: 128 to 255 (unchanged)

Only the keys originally managed by Server A are reassigned to Server B, while the rest of the keys remain unchanged.

Scenario 3: Virtual Replicas for Load Balancing

To handle non-uniform data distribution, we can use virtual replicas. Instead of mapping each server to a single point on the ring, we map each server to multiple points:

- Server A might be mapped to points 10, 110, and 210.

- Server B might be mapped to points 30, 130, and 230.

- Server C might be mapped to points 50, 150, and 250.

With these virtual replicas, keys are distributed more evenly across the servers. This prevents any single server from becoming a hotspot, ensuring a balanced load even if the data is not uniformly distributed.

Comparing Traditional Hashing and Consistent Hashing

- Traditional Hashing (`key % n`):

- Adding a new server changes n, requiring all keys to be remapped.

- Example: With 3 servers and key % 3, keys 1, 2, 3 are mapped to servers 1, 2, 0 respectively. Adding a fourth server (key % 4) disrupts all mappings, causing extensive reorganization.

- Consistent Hashing:

- Only a small portion of keys are affected, allowing for seamless scaling and maintenance.

- Example: With servers A, B, C, and adding D as described above, only the keys between 128 and 170 are remapped, minimizing the impact on the overall system.

Consistent hashing is crucial for designing robust distributed caching systems and distributed hash tables. It ensures efficient scaling and load balancing, making your systems more resilient and easier to manage.

查看更多评论

要查看或添加评论，请登录

Jayaraj Singh的更多文章

Understanding Long-Polling, WebSockets, and Server-Sent Events: A Guide for Beginners

2024年7月13日

Understanding Long-Polling, WebSockets, and Server-Sent Events: A Guide for Beginners

In today's interconnected world, real-time communication between a client (like a web browser) and a server is…
?? Partition databases: here's why????

2024年5月31日

?? Partition databases: here's why????

In today's data-driven world, managing large-scale databases efficiently is crucial. Sharding, or data partitioning, is…

6 条评论

Mastering Data Distribution with Consistent Hashing

Jayaraj Singh

SDE 1 Backend Developer @ InterviewVector

领英推荐

Jayaraj Singh的更多文章

社区洞察

其他会员也浏览了

Monitoring Kubernetes Resources with the Metrics Server

What will happen if your data center catches fire? How AWS Region and AZs can help? what is CDN? let's understand in detail with real-life examples.

Azure File Sync : File Storage and Synchronization in Hybrid Environments

Azure Weekly Updates - 07th November 2021

IBM i on Power Virtual Server - Taking (consistent) snapshots

Azure Weekly Update

What are Load Balancers ???

Azure Weekly Updates - July 25th, 2022

领英推荐

Jayaraj Singh的更多文章

Understanding Long-Polling, WebSockets, and Server-Sent Events: A Guide for Beginners

?? Partition databases: here's why????

社区洞察

其他会员也浏览了

Monitoring Kubernetes Resources with the Metrics Server

What will happen if your data center catches fire? How AWS Region and AZs can help? what is CDN? let's understand in detail with real-life examples.

Azure File Sync : File Storage and Synchronization in Hybrid Environments

Azure Weekly Updates - 07th November 2021

IBM i on Power Virtual Server - Taking (consistent) snapshots

Azure Weekly Update

What are Load Balancers ???

Azure Weekly Updates - July 25th, 2022