登录查看更多内容

Consistent Hashing: A Guide for Distributed Systems

Bhuvnesh Arya

Software Architect | IoT, Cloud and Software Engineering Leader | Technical Mentor | Building Next-Gen Software Solutions

发布日期: 2024年2月27日

Introduction:

In distributed systems, efficiently distributing data across multiple nodes is crucial for scalability and fault tolerance. Traditional hashing methods pose challenges in dynamically scaling systems due to their inability to maintain balanced data distribution. Enter consistent hashing, a technique that addresses these challenges and offers a scalable solution.

What is Hashing?

Hashing is a fundamental concept in computer science used for mapping data of arbitrary size to fixed-size values. This process involves applying a hash function to data, producing a hash value or hash code. Hash functions ensure that data mapping is deterministic and efficient, making them invaluable for indexing and retrieving data.

Distributed Hashing:

Distributed hashing extends traditional hashing to distribute data across multiple nodes in a distributed system. Each node is responsible for a portion of the data, determined by applying a hash function to keys and mapping them to nodes. While effective for static environments, traditional distributed hashing struggles with dynamic scaling and maintaining data distribution.

Server Selection:

In this table, each row represents a key-value pair, where the key is the data being hashed, the Hash column displays the hash value calculated for each key, and the Hash%Server Number column represents the modulo operation of the hash value with the number of servers to determine the server responsible for storing the key.

Code Snippet - Distributed Hashing Using Hash Table

import java.util.HashMap;
import java.util.Map;

public class DistributedHashing {
    // Define the number of nodes in the distributed system
    private static final int NUM_NODES = 5;

    // Create hash tables for each node
    private static final Map<Integer, Map<String, Object>> nodes = new HashMap<>();

    static {
        // Initialize hash tables for each node
        for (int i = 0; i < NUM_NODES; i++) {
            nodes.put(i, new HashMap<>());
        }
    }

    // Method to determine the node responsible for a given key
    private static int getNodeForKey(String key) {
        int hashCode = key.hashCode();
        return Math.abs(hashCode % NUM_NODES);
    }

    // Method to put data into the distributed hash table
    public static void put(String key, Object value) {
        int nodeIndex = getNodeForKey(key);
        Map<String, Object> node = nodes.get(nodeIndex);
        node.put(key, value);
    }

    // Method to retrieve data from the distributed hash table
    public static Object get(String key) {
        int nodeIndex = getNodeForKey(key);
        Map<String, Object> node = nodes.get(nodeIndex);
        return node.get(key);
    }

    // Method to remove data from the distributed hash table
    public static void remove(String key) {
        int nodeIndex = getNodeForKey(key);
        Map<String, Object> node = nodes.get(nodeIndex);
        node.remove(key);
    }
}

Consistent Hashing:

Consistent hashing is a solution to the limitations of traditional distributed hashing. It introduces the concept of virtual nodes and a hash ring, where each node and key are mapped onto a ring. Keys are then mapped to the nearest node on the ring, ensuring a balanced distribution of data. This approach minimizes data redistribution when nodes are added or removed, making it ideal for dynamic environments.

Server Selection:

领英推荐

IPFS Clustering with Kubernetes: Advancing…

Rishita Shaw 1 年前

From RDS-Centric to Distributed Systems: An Evolution…

杨刚 1 个月前

Leveraging S3 for Distributed Concurrency Control in…

Soumil S. 1 个月前

Each server is represented as NodeX#Y, where X is the server identifier and Y is the virtual node identifier.

Code Snippet - Consistent Hashing Using Hash Ring

import java.util.SortedMap;
import java.util.TreeMap;

public class ConsistentHashing {
    // Create a hash ring to store nodes
    private final SortedMap<Integer, String> ring = new TreeMap<>();

    // Method to add a node to the hash ring
    public void addNode(String node) {
        int hash = node.hashCode();
        ring.put(hash, node);
    }

    // Method to remove a node from the hash ring
    public void removeNode(String node) {
        int hash = node.hashCode();
        ring.remove(hash);
    }

    // Method to find the node responsible for a given key
    public String getNodeForKey(String key) {
        if (ring.isEmpty()) {
            return null;
        }
        int hash = key.hashCode();
        SortedMap<Integer, String> tailMap = ring.tailMap(hash);
        if (tailMap.isEmpty()) {
            return ring.get(ring.firstKey()); // Wrap around if key exceeds maximum hash
        }
        return tailMap.get(tailMap.firstKey()); // Return the node with the closest higher hash
    }
}

Advantages of Consistent Hashing:

Scalability: Consistent hashing scales seamlessly with the addition or removal of nodes, minimising data redistribution.
Load Balancing: By evenly distributing data across nodes, consistent hashing balances the load on the system, improving performance.
Fault Tolerance: In the event of node failures, consistent hashing ensures minimal data loss or redistribution, enhancing system resilience.

Usage and Use Cases:

Consistent hashing finds applications in various distributed systems, including:

Content Distribution Networks (CDNs): Efficiently distribute content across edge servers.
Distributed Caching: Distribute cached data across cache nodes for improved performance.
Key-Value Stores: Map keys to storage nodes in distributed key-value stores for efficient data retrieval.

Limitations:

While consistent hashing offers numerous advantages, it's essential to consider its limitations:

Skewed Data Distribution: In some scenarios, consistent hashing may lead to uneven data distribution, requiring additional techniques for load balancing.
Implementation Complexity: Implementing consistent hashing algorithms may introduce complexity compared to traditional hashing methods.

Conclusion:

Consistent hashing is a powerful technique for efficiently distributing data in distributed systems. By overcoming the limitations of traditional hashing methods, it enables scalable, fault-tolerant architectures. Understanding consistent hashing and its applications is essential for building robust distributed systems in modern computing environments.

要查看或添加评论，请登录

Bhuvnesh Arya的更多文章

Why Modern Applications Rely on Event-Driven Architecture

2025年3月9日

Why Modern Applications Rely on Event-Driven Architecture

Every week, I share insights from my journey—whether it’s a Software Architecture Lesson, a Software Architect’s…

6 条评论
The Future of Software Engineering in the Age of AI: What Skills Really Matter?

2025年2月28日

The Future of Software Engineering in the Age of AI: What Skills Really Matter?

With the rapid advancements in artificial intelligence, there’s a growing narrative that AI will replace software…
A System Design Framework for Scalable Architectures

2025年2月16日

A System Design Framework for Scalable Architectures

Every week, I share insights from my journey—whether it’s a Software Architecture Lesson, a Software Architect’s…

1 条评论
Monolith or Microservices? Making the Right Call

2025年2月8日

Monolith or Microservices? Making the Right Call

Welcome to the first edition of Bhuvnesh’s Newsletter! Every week, I’ll be sharing insights from my journey—whether…
How LLMs Empower Software Architects in Making Balanced Decisions

2024年12月12日

How LLMs Empower Software Architects in Making Balanced Decisions

In the dynamic world of software architecture, one key principle reigns supreme: there is no one-size-fits-all…

2 条评论
The Artificial Intelligence of Things (AIoT)

2024年10月25日

The Artificial Intelligence of Things (AIoT)

The Artificial Intelligence of Things (AIoT) represents the convergence of two transformative technologies: Artificial…
A Call for a Stronger Sporting Nation

2024年8月6日

A Call for a Stronger Sporting Nation

India's Performance in the 2024 Olympics! As we reach Day 6 of the 2024 Olympics, India proudly stands with three…
Event-Driven Architecture: Request/Reply Messaging

2024年5月27日

Event-Driven Architecture: Request/Reply Messaging

In the world of software architecture, the event-driven approach has gained substantial traction for its ability to…

1 条评论
The Essence of Domain-Driven Design (DDD)

2023年9月10日

The Essence of Domain-Driven Design (DDD)

Introduction: Domain-Driven Design (DDD) is a software development approach that focuses on creating software that…
IoT-Based Smart Cities - System Architecture

2023年8月21日

IoT-Based Smart Cities - System Architecture

Introduction: The rapid advancement of Internet of Things (IoT) technology has paved the way for the development of…

5 条评论

See all articles

Consistent Hashing: A Guide for Distributed Systems

Bhuvnesh Arya

Software Architect | IoT, Cloud and Software Engineering Leader | Technical Mentor | Building Next-Gen Software Solutions

Introduction:

What is Hashing?

Distributed Hashing:

Consistent Hashing:

领英推荐

Advantages of Consistent Hashing:

Usage and Use Cases:

Limitations:

Conclusion:

Bhuvnesh Arya的更多文章

社区洞察

其他会员也浏览了

Fundamentals of data communication in distributed systems: Protocols, architectures and challenges

Hashing in Distributed Systems: A Mathematical Deep Dive

Power of Distributed Database and Computing for High-Frequency Transactions

Gossip Dissemination (Design Pattern of Distributed Systems)

Understanding Distributed Systems: The Key Challenges of Consistency, Availability, and Partition Tolerance (CAP Theorem)

Optimizing Distributed Systems: A Deep Dive into Continuous Improvement

Commit Protocols: Ensuring Data Integrity in Distributed Systems

?? Advanced API Rate Limiting in Distributed Systems: Mastering Load Control and Ensuring Stability

Consistency Wars: Strong vs. Eventual Consistency in Distributed Systems

Consistent Hashing

Introduction:

What is Hashing?

Distributed Hashing:

Consistent Hashing:

领英推荐

Advantages of Consistent Hashing:

Usage and Use Cases:

Limitations:

Conclusion:

Bhuvnesh Arya的更多文章

Why Modern Applications Rely on Event-Driven Architecture

The Future of Software Engineering in the Age of AI: What Skills Really Matter?

A System Design Framework for Scalable Architectures

Monolith or Microservices? Making the Right Call

How LLMs Empower Software Architects in Making Balanced Decisions

The Artificial Intelligence of Things (AIoT)

A Call for a Stronger Sporting Nation

Event-Driven Architecture: Request/Reply Messaging

The Essence of Domain-Driven Design (DDD)

IoT-Based Smart Cities - System Architecture

社区洞察

其他会员也浏览了

Fundamentals of data communication in distributed systems: Protocols, architectures and challenges

Hashing in Distributed Systems: A Mathematical Deep Dive

Power of Distributed Database and Computing for High-Frequency Transactions

Gossip Dissemination (Design Pattern of Distributed Systems)

Understanding Distributed Systems: The Key Challenges of Consistency, Availability, and Partition Tolerance (CAP Theorem)

Optimizing Distributed Systems: A Deep Dive into Continuous Improvement

Commit Protocols: Ensuring Data Integrity in Distributed Systems

?? Advanced API Rate Limiting in Distributed Systems: Mastering Load Control and Ensuring Stability

Consistency Wars: Strong vs. Eventual Consistency in Distributed Systems

Consistent Hashing