Sharding
Gen. using ChatGpt

Sharding

As you know, any system backbone is data and data grows everyday. According to statistics, by year 2028, the projected worldwide volume of the data can grow upto 394 Zettabytes (reg. see statistics). In simple language, you almost need around 98,500 DVDs to store that many of data.

The reason to giving above information to make you understand, how data is growing and what you can do to handle if you have large volume of data to process through your application. I've collected some good information that explains, how you can achieve scaling in your application; and one of the popular concept I’m going to discuss here called Sharding.

Sharding

Database sharding is a horizontal partitioning strategy where a large database is divided into smaller, more manageable pieces called shards. Each shard is an independent database that contains a subset of the data, allowing the overall system to scale by distributing the load across multiple servers.

Sharding is important when you want to achieve scalability due to large volume of data, improve the performance, manageable and cost effective.


Sharding Strategies

1. Range-Based Sharding:

  • Data is divided into shards based on ranges of the sharding key (e.g., user IDs 1–1000 go to shard A, 1001–2000 go to shard B).
  • Pros: Easy to implement and understand.
  • Cons: Uneven data distribution may lead to hotspots.


2. Hash-Based Sharding:

  • The sharding key is hashed, and the hash value determines the shard.
  • Pros: Provides uniform data distribution.
  • Cons: Harder to re-shard when adding new shards.


3. Geographic Sharding:

  • Data is divided based on geographical regions.
  • Pros: Useful for applications with distinct regional usage patterns.
  • Cons: Inefficient if data is accessed globally.


4. Directory-Based Sharding:

  • A directory keeps track of which shard contains which data.
  • Pros: Flexible and dynamic.
  • Cons: Directory maintenance adds complexity.


Setting up database sharding locally using SQL Server involves creating multiple databases (shards) and distributing the data among them. SQL Server doesn't have built-in sharding support like some other databases, so you'll have to implement sharding logic yourself. Here's a step-by-step guide:


Step 1: Install SQL Server

Download and Install SQL Server:

  • Download the installer from the official Microsoft website.
  • Follow the installation instructions to install SQL Server on your local machine.

Install SQL Server Management Studio (SSMS):

Step 2: Create Shard Databases

  1. Open SQL Server Management Studio (SSMS).
  2. Connect to Your SQL Server Instance.
  3. Create Multiple Databases (shards). For example, create three databases:

CREATE DATABASE Shard1;
CREATE DATABASE Shard2;
CREATE DATABASE Shard3;        

Step 3: Create Schema and Tables in Each Shard

USE Shard1;
CREATE TABLE Customers (
    CustomerId INT PRIMARY KEY,
    Name NVARCHAR(100),
    Email NVARCHAR(100)
);

USE Shard2;
CREATE TABLE Customers (
    CustomerId INT PRIMARY KEY,
    Name NVARCHAR(100),
    Email NVARCHAR(100)
);

USE Shard3;
CREATE TABLE Customers (
    CustomerId INT PRIMARY KEY,
    Name NVARCHAR(100),
    Email NVARCHAR(100)
);        

Step 4: Distribute Data Across Shards

Implement a sharding strategy to distribute data across the shards. Common strategies include range-based sharding, hash-based sharding, and list-based sharding.

Example: Range-Based Sharding:

  • Distribute customers based on CustomerId.

-- Insert into Shard1
USE Shard1;
INSERT INTO Customers (CustomerId, Name, Email)
VALUES (1, 'John Doe', '[email protected]');

-- Insert into Shard2
USE Shard2;
INSERT INTO Customers (CustomerId, Name, Email)
VALUES (101, 'Jane Smith', '[email protected]');

-- Insert into Shard3
USE Shard3;
INSERT INTO Customers (CustomerId, Name, Email)
VALUES (201, 'Mike Johnson', '[email protected]');        

Step 5: Implement Routing Logic in Your Application

Update your application to route queries to the appropriate shard based on the sharding strategy. Here's a basic example in C#:

using System;
using System.Data.SqlClient;

public class ShardingExample
{
    private static string GetConnectionString(int customerId)
    {
        if (customerId <= 100)
            return "Server=localhost;Database=Shard1;Integrated Security=True;";
        else if (customerId <= 200)
            return "Server=localhost;Database=Shard2;Integrated Security=True;";
        else
            return "Server=localhost;Database=Shard3;Integrated Security=True;";
    }

    public static void InsertCustomer(int customerId, string name, string email)
    {
        var connectionString = GetConnectionString(customerId);
        using (var connection = new SqlConnection(connectionString))
        {
            connection.Open();
            var command = new SqlCommand("INSERT INTO Customers (CustomerId, Name, Email) VALUES (@CustomerId, @Name, @Email)", connection);
            command.Parameters.AddWithValue("@CustomerId", customerId);
            command.Parameters.AddWithValue("@Name", name);
            command.Parameters.AddWithValue("@Email", email);
            command.ExecuteNonQuery();
        }
    }

    public static void Main(string[] args)
    {
        InsertCustomer(1, "John Doe", "[email protected]");
        InsertCustomer(101, "Jane Smith", "[email protected]");
        InsertCustomer(201, "Mike Johnson", "[email protected]");
    }
}        

Step 6: Test Your Sharding Setup

  1. Insert Test Data: Run your application to insert test data into the appropriate shards.
  2. Query Data: Verify that data is correctly distributed across the shards.
  3. Check Performance: Measure query performance to ensure that sharding improves scalability and performance.

Conclusion

Setting up database sharding locally using SQL Server involves creating multiple databases to act as shards, implementing a sharding strategy, and updating your application to route queries to the appropriate shard. This approach helps distribute the load and improves performance and scalability.


要查看或添加评论,请登录

Rajeev Singh的更多文章

  • Your local AI Chat

    Your local AI Chat

    Are you concern about your data security and also wanted to use power of ChatGPT kind of great tool but fearing to give…

  • Master-Slave Databases

    Master-Slave Databases

    When data grows in your application, the performance deteriorates and that's true. You can't stop data from developing…

  • How Well Do You Define Your APIs?

    How Well Do You Define Your APIs?

    API stands for Application Programming Interface, but have you ever wondered how the concept originated? Let’s take a…

  • HTTP Protocol

    HTTP Protocol

    What is HTTP and how it works? Http developed by Tim Berners-Lee when he was in CERN year 1989-1991. Everyone has…

    1 条评论
  • Change Data Capture (CDC)

    Change Data Capture (CDC)

    Change data capture utilizes the SQL Server Agent to log insertions, updates, and deletions occurring in a table. So…

  • On-Prem Database Deployment (DevOps)

    On-Prem Database Deployment (DevOps)

    Introduction If you are a developer and in the tech world, you must have heard about databases. This is the heart of…

    1 条评论
  • Docker - Easy Way

    Docker - Easy Way

    Definition Docker is a powerful tool that provides a platform to package solutions for deployment. We can think like a…

    1 条评论

社区洞察

其他会员也浏览了