Mastering Data Distribution with Consistent Hashing
Building scalable and fault-tolerant systems requires efficiently distributing data across multiple servers. Traditional hashing methods like key % n often disrupt mappings when scaling, leading to significant maintenance challenges. This is where consistent hashing shines.
The Advantages of Consistent Hashing
- Minimised Reorganisation: Only a small fraction of keys need to be remapped when nodes are added or removed, unlike traditional hashing.
- Enhanced Load Balancing: Virtual replicas ensure even data distribution, preventing some servers from becoming overloaded.
How Consistent Hashing Operates:
1. Server Hashing: Servers are hashed to points on a circular ring.
2. Key-to-Server Mapping: Keys are hashed and assigned to the nearest server in a clockwise direction.
3. Dynamic Scaling: Adding or removing servers only requires reassignment of keys near the affected area.
4. Virtual Replicas for Load Balancing: Servers are mapped to multiple points on the ring for better distribution.
Detailed Example:
Scenario 1: Adding a New Server
Imagine three servers, A, B, and C, hashed to points on a ring from 0 to 256. The keys are distributed as follows:
- Server A handles keys from 0 to 85.
- Server B handles keys from 86 to 170.
- Server C handles keys from 171 to 255.
Now, we add a new server, D, which hashes to point 128. The new distribution would be:
- Server A: 0 to 85 (unchanged)
- Server B: 86 to 127
- Server D: 128 to 170 (takes over some key space from B)
- Server C: 171 to 255 (unchanged)
领英推荐
In this scenario, only the keys from Server B that fall between 128 and 170 need to be reassigned to Server D. The rest of the keys remain unaffected, showcasing the efficiency of consistent hashing in minimizing disruption.
Scenario 2: Removing a Server
Consider removing Server A, which handles keys from 0 to 85. The remaining servers, B and C, will need to take over the key space of A:
- Server B: 0 to 127 (expands to cover from 0 to 85)
- Server C: 128 to 255 (unchanged)
Only the keys originally managed by Server A are reassigned to Server B, while the rest of the keys remain unchanged.
Scenario 3: Virtual Replicas for Load Balancing
To handle non-uniform data distribution, we can use virtual replicas. Instead of mapping each server to a single point on the ring, we map each server to multiple points:
- Server A might be mapped to points 10, 110, and 210.
- Server B might be mapped to points 30, 130, and 230.
- Server C might be mapped to points 50, 150, and 250.
With these virtual replicas, keys are distributed more evenly across the servers. This prevents any single server from becoming a hotspot, ensuring a balanced load even if the data is not uniformly distributed.
Comparing Traditional Hashing and Consistent Hashing
- Traditional Hashing (`key % n`):
- Adding a new server changes n, requiring all keys to be remapped.
- Example: With 3 servers and key % 3, keys 1, 2, 3 are mapped to servers 1, 2, 0 respectively. Adding a fourth server (key % 4) disrupts all mappings, causing extensive reorganization.
- Consistent Hashing:
- Only a small portion of keys are affected, allowing for seamless scaling and maintenance.
- Example: With servers A, B, C, and adding D as described above, only the keys between 128 and 170 are remapped, minimizing the impact on the overall system.
Consistent hashing is crucial for designing robust distributed caching systems and distributed hash tables. It ensures efficient scaling and load balancing, making your systems more resilient and easier to manage.