Consistent Hashing

It is one of the system design techniques used to optimize performance issues with horizontal hashing while scaling.

The issue: We have loads of data and it needs to map through the nodes properly so that the accessibility of the data would be easy.

This will also be solved if we use the normal hash functions to map the data to the nodes.

But why Consistent Hashing then?

The issue with normal hashing would come into play when we are about to add a new node or any node would be down. In this case, the whole data has to be shifted accordingly since the number of nodes would be affected and hence the hash will get changed for all the data.

Consistent hashing would help to solve the issue by moving the data of the failed node to the nearest adjacent node/successor. So only a small portion of the data will be affected. Similarly, in case of the addition of a new node, the nearest node some data would be transferred to the new node.

There is still an issue with Consistent Hashing, i.e. uneven distribution of data among the nodes. This can be solved with the concept of virtual nodes(The technique of assigning multiple positions to a node is known as a?virtual node). and the position depends on the capacity of the node.

Implementation is based on a Binary Search Tree.

Binary Search Tree is used to store the position of nodes in the hash ring where all the nodes are positioned.

Centralized, so gets stored on each node and it gets synchronized through state information transfer. Gossip Protocol is the one used here.

The hashing algorithms used: Murmur Hash, MD5 Hash.

Applications are DynamoDB an AWS service, Cassandra DB, VideoSteaming Platforms, etc.

One thing to note: Replication and partitioning are orthogonal to each other.

**Orthogonal features: features that are related but not identical. Removal of a feature and making other features powerful and general enough to handle all the general use cases.

For More refer to article with DynamoDB and consistent hashing: https://medium.com/@adityashete009/consistent-hashing-amazon-dynamodb-part-1-f5719aff7681



要查看或添加评论,请登录

Pushkal Goyal的更多文章

  • Partitioning & Replication

    Partitioning & Replication

    Partitioning: Process of dividing data into independent segments. Need of Partitioning: If the data is served without…

  • Merkle Tree :

    Merkle Tree :

    A concept popular in Distributed Systems. Merkle Tree is a binary tree used for easy search and secure verification of…

  • Interpreter and Compiler

    Interpreter and Compiler

    Python => interpreter & other languages like CPP => compiler-based languages. I didn't know the way I could see the…

  • Circular Imports in Python

    Circular Imports in Python

    **packages : directory with __init__.py **modules : files with .

  • POSIX

    POSIX

    It stands for Portable Operating System Interface. This is a compliance introduced by the US government for procurement…

社区洞察

其他会员也浏览了