Difference between Database Sharding and Partitioning
Sharding and partitioning come in very handy when we want to scale our systems. Let's talk about these concepts in detail.
How is the database scaled?
A database server is just a database process (like MySQL, MongoDB) running on a virtual server like EC2. Now when we put our database in production it starts getting from real good traction, say 100 writes per second (WPS).
Steady user growth
Say, your product started getting some traction, and we find that the database is not able to handle the load, we scale it up by adding more CPU, RAM, and Disk to the server. This way we are now handling 200 WPS.
More read traffic
If we see nor reads then can also choose to add a Read Replica and divert some of the read traffic to this node, while the master node can take in 200 WPS.
Viral Growth
Say, your product went viral and you now got 5x more load which means now you have to handle 1000 WPS. To achieve this you again scale it up vertically and handle the desired load.
Insane growth
Say, you now cracked the PMF and are getting some really solid traction and need to handle 1500 WPS, and when you visit the database console you found out that it is not possible to vertically scale your database any further, so how do you handle 1500 WPS?
This is where the horizontal scaling comes into the picture.
Scaling the database horizontally
We know one database server can handle 1000 WPS, but we need to handle 1500 WPS, so we split the data into half and split it across two databases such that each database owns half of the data and all the writes for that data goes to that particular instance.
This way each server will get 750 WPS, which it can very easily handle, and owns 50% of the data. Thus by adding more database servers we handled 1500 WPS (more than what a single machine could handle)
Sharding and Partitioning
Each database server in the above architecture is called a Shard while the data is said to be partitioned. Overall, a database is sharded and the data is partitioned.
Partitioned data on shards
It is possible to have more partitions and fewer shards and in that case, each shard will own multiple partitions. Say, we have 100GB of data and it is split into 5 partitions and we have 2 shards. One shard will be responsible for 3 partitions while the other for 2.
Advantages and Disadvantages
Advantages of Sharding
领英推荐
Disadvantages of Sharding
Here's the video of my explaining this in-depth ?? do check it out
Sharding and partitioning come in very handy when we want to scale our systems. These concepts operate on the database and help us improve the overall throughput and availability of the system.
In this video, we take a detailed look into how a database is scaled and evolved through different stages, what sharding and partitioning are, understand the difference between them, see at which stage should we introduce this complexity, and a few advantages and disadvantages of adopting them.
Outline:
You can also
Thank you so much for reading ?? If you found this helpful, do spread the word about it on social media; it would mean the world to me.
Yours truly,
Arpit
Until next time, stay awesome :)
I teach a course on System Design where you'll learn how to intuitively design scalable systems. The course will help you
I have compressed my ~10 years of work experience into this course, and aim to accelerate your engineering growth 100x. To date, the course is trusted by 600+ engineers from 10 different countries and here you can find what they say about the course.
Together, we will build some of the most amazing systems and dissect them to understand the intricate details. You can find the week-by-week curriculum and topics, benefits, testimonials, and other information here https://arpitbhayani.me/masterclass.
Software Engineer @ Adobe | Building AI products
2 年Such an appropriate illustration.
CEO @ Cosmocloud | Ex-LinkedIn | Angel Investor | MongoDB Champion | Book Author | Patent Holder (Distributed Algorithms)
2 年++ for my community
Specialist in Business Development | Operations and Commercial | Strategic Planning
2 年Thank you for post
Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022
2 年Informative share ???? Arpit Bhayani
Software Engineer III at Wayfair | IIT Indore | Ex-Grab | Ex-Adobe
2 年We should specifically mention here that in partitioning , the partitions lies within a single database instance whereas in sharding the shards lies across different database servers. I feel that is the main difference and not highlighted by this article.