Difference between Database Sharding and Partitioning

Difference between Database Sharding and Partitioning

Sharding and partitioning come in very handy when we want to scale our systems. Let's talk about these concepts in detail.

How is the database scaled?

A database server is just a database process (like MySQL, MongoDB) running on a virtual server like EC2. Now when we put our database in production it starts getting from real good traction, say 100 writes per second (WPS).

Steady user growth

Say, your product started getting some traction, and we find that the database is not able to handle the load, we scale it up by adding more CPU, RAM, and Disk to the server. This way we are now handling 200 WPS.

More read traffic

If we see nor reads then can also choose to add a Read Replica and divert some of the read traffic to this node, while the master node can take in 200 WPS.

Viral Growth

Say, your product went viral and you now got 5x more load which means now you have to handle 1000 WPS. To achieve this you again scale it up vertically and handle the desired load.

Insane growth

Say, you now cracked the PMF and are getting some really solid traction and need to handle 1500 WPS, and when you visit the database console you found out that it is not possible to vertically scale your database any further, so how do you handle 1500 WPS?

This is where the horizontal scaling comes into the picture.

Scaling the database horizontally

We know one database server can handle 1000 WPS, but we need to handle 1500 WPS, so we split the data into half and split it across two databases such that each database owns half of the data and all the writes for that data goes to that particular instance.

This way each server will get 750 WPS, which it can very easily handle, and owns 50% of the data. Thus by adding more database servers we handled 1500 WPS (more than what a single machine could handle)

Sharding and Partitioning

Each database server in the above architecture is called a Shard while the data is said to be partitioned. Overall, a database is sharded and the data is partitioned.

Partitioned data on shards

It is possible to have more partitions and fewer shards and in that case, each shard will own multiple partitions. Say, we have 100GB of data and it is split into 5 partitions and we have 2 shards. One shard will be responsible for 3 partitions while the other for 2.

Advantages and Disadvantages

Advantages of Sharding

  • handle more reads and writes
  • increases overall storage capacity
  • overall high availability

Disadvantages of Sharding

  • sharding is operationally complex
  • cross-shard queries are super-expensive

Here's the video of my explaining this in-depth ?? do check it out

Sharding and partitioning come in very handy when we want to scale our systems. These concepts operate on the database and help us improve the overall throughput and availability of the system.

In this video, we take a detailed look into how a database is scaled and evolved through different stages, what sharding and partitioning are, understand the difference between them, see at which stage should we introduce this complexity, and a few advantages and disadvantages of adopting them.

Outline:

  • 00:00 Introduction and Agenda
  • 03:05 How a database is progressively scaled?
  • 08:10 Scaling beyond the limit of vertical scaling
  • 11:57 Sharding vs Partitioning
  • 12:43 Example of Data Partitioning
  • 17:15 Sharding and Partitioning together
  • 20:20 Advantages and Disadvantages of Sharding and Partitioning

You can also

Thank you so much for reading ?? If you found this helpful, do spread the word about it on social media; it would mean the world to me.

You can also follow me on your favourite social media LinkedIn, and Twitter.

Yours truly,

Arpit

arpitbhayani.me

Until next time, stay awesome :)

No alt text provided for this image

I teach a course on System Design where you'll learn how to intuitively design scalable systems. The course will help you

  • become a better engineer
  • ace your technical discussions
  • get you acquainted with a massive spectrum of topics ranging from Storage Engines, High-throughput systems, to super-clever algorithms behind them.

I have compressed my ~10 years of work experience into this course, and aim to accelerate your engineering growth 100x. To date, the course is trusted by 600+ engineers from 10 different countries and here you can find what they say about the course.

Together, we will build some of the most amazing systems and dissect them to understand the intricate details. You can find the week-by-week curriculum and topics, benefits, testimonials, and other information here https://arpitbhayani.me/masterclass.

Ajay Saini

Software Engineer @ Adobe | Building AI products

2 年

Such an appropriate illustration.

Shrey Batra

CEO @ Cosmocloud | Ex-LinkedIn | Angel Investor | MongoDB Champion | Book Author | Patent Holder (Distributed Algorithms)

2 年

++ for my community

Pramod Agarwal

Specialist in Business Development | Operations and Commercial | Strategic Planning

2 年

Thank you for post

POOJA JAIN

Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022

2 年

Informative share ???? Arpit Bhayani

Prakhar Saxena

Software Engineer III at Wayfair | IIT Indore | Ex-Grab | Ex-Adobe

2 年

We should specifically mention here that in partitioning , the partitions lies within a single database instance whereas in sharding the shards lies across different database servers. I feel that is the main difference and not highlighted by this article.

要查看或添加评论,请登录

Arpit Bhayani的更多文章

  • The Ideal End To An Ideal Career

    The Ideal End To An Ideal Career

    This edition of the newsletter contains one quick write-up that will help you grow faster in your career a video I…

    6 条评论
  • How to Find and Ride the Next Tech Wave

    How to Find and Ride the Next Tech Wave

    This edition of the newsletter contains one quick write-up that will help you grow faster in your career a video I…

    6 条评论
  • Engineer or Manager? How to Decide Your Path

    Engineer or Manager? How to Decide Your Path

    This edition of the newsletter contains one quick write-up that will help you grow faster in your career a video I…

    7 条评论
  • One Career Bet Worth Taking

    One Career Bet Worth Taking

    This edition of the newsletter contains one quick write-up that will help you grow faster in your career a video I…

    5 条评论
  • Leave your job with grace and gratitude

    Leave your job with grace and gratitude

    This edition of the newsletter contains one quick write-up that will help you grow faster in your career a video I…

    7 条评论
  • Turn Boring Projects into Opportunities

    Turn Boring Projects into Opportunities

    This edition of the newsletter contains one quick write-up that will help you grow faster in your career a video I…

    1 条评论
  • When is the right time to switch?

    When is the right time to switch?

    This edition of the newsletter contains one quick write-up that will help you grow faster in your career a video I…

    8 条评论
  • Ramping up faster in your new job

    Ramping up faster in your new job

    This edition of the newsletter contains one quick write-up that will help you grow faster in your career a video I…

    4 条评论
  • Back Your Disagreement with Data

    Back Your Disagreement with Data

    This edition of the newsletter contains one quick write-up that will help you grow faster in your career a video I…

    2 条评论
  • Doubt yourself every day

    Doubt yourself every day

    This edition of the newsletter contains one quick write-up that will help you grow faster in your career a video I…

    9 条评论

社区洞察

其他会员也浏览了