登录查看更多内容

Horizontal Partitioning, Scaling and Sharding - with Real Examples

Shrey Batra

Founder @ Cosmocloud ◆ Ex-LinkedIn ◆ Angel Investor ◆ MongoDB Champion ◆ Book Author ◆ Patent Holder (Distributed Algorithm)

发布日期: 2022年8月15日

Everyone has heard about what horizontal partitioning is and what is sharding, but its still a problem people don't really understand properly. Most of the people think sharding and replication help scalability. Probably true, but how? Let's take a deep look into it.

Your Likes for this article would go a long way for me to reach out new folks..! ??

What is Horizontal Partitioning?

Breaking something down into smaller chunks of the same entity, is called horizontal partitioning.

Each partition are responsible for a subset of the data being processed or stored by the main entity. Let's look at some examples -

Horizontally Partitioning an SQL Table

Be it MySQL or PostgreSQL, in SQL based databases, we have tables..! To partition each table (a single entity) we break it down into multiple smaller tables and disburse them in separate running instances/machines.

Let's say our main table had information of 100 employees, each with ID from 1-100. Let's partition (break) our table based on IDs into 10 equal partitions - from 1-10, 11-20, 21-30 and so on...

Horizontally Partitioning a Kafka Topic

Kafka, along with any other Queue like system, can also partition its queue (a Kafka Topic). Rather than asking every producer to push to the same queue, and every consumer to read from the same queue, we can partition (break) the topic/queue into multiple smaller topics.

Just an FYI, a topic behaves differently from a queue, but just taking it into a similar anology for now.

Now, how does Partitioning in Kafka works? There is a Hash Function -- which computes where does each message in Kafka go, in which partition? Let's say we partition based on Msg Key -- an identifier based on producer ID. Now if we have 100 producers, we can equally distribute all producers to let's say, 5 smaller queues/topics or partitions.

Each partition can have a separate consumer, which can help speed up consumption because of parallelisation.

Horizontally Partitioning a Backend API Service

Let's say you have 1 instance of your API server running and all 1 Million requests goes through that same instance. Think about the load that server would be facing..!!

Now, how can we distribute this load of requests? We can partition our requests and spin up 100 instances of API servers where each instance handles only 10,000 requests now.

Vivek Bansal 8 个月前

Financial Data in Oracle OCI with kdb+ Time-Series DB

Jakub Polec 8 个月前

The Latest in Distributed SQL - October

TiDB, powered by PingCAP 3 周前

Basically, now you should understand what is horizontal partitioning.

What is Sharding?

Sharding is similar to horizontal partitioning of data, but makes sure that that each partition is actually having a separate CPU and Memory allocated to it, as well as it can live as a separate instance of the larger entity.

That means, if we break the table into multiple smaller tables, each table is placed on a separate database server altogether, be it on the same machine (containerised) or on separate machines (always preferable).

So what is Horizontal Scaling?

As you might have guessed, increasing or decreasing the number of partitions/shards in your system is known as horizontal scaling..! Remember, the whole data of one partition is to be transferred over to another machine over network, so these operations could be very costly..!

How is Replication different?

Replication should not be talked about when you are talking about Scaling (other than very limited scenarios). As Paritioning/Sharding are related to scalaing, replication is related to availability. It is about making copies of original data and spreading it on different machines, so that if the main instance goes down, a copy can takeover as the primary instance.

Can you have both Sharding and Replication?

Yes, why not? You can replicate the larger table, as well as smaller tables..! Copying data is independent of partitioning logic and should be kept so (in most general cases).

How does Replication help in Scalability then?

I said other than very limited scenarios above, because replication can help scale the system in some cases. Think about read heavy scenarios in MySQL where you have your replicas standing still and wasting cost. Why not send your reads over to your replica for spreading load? If you have 3 replicas of each shard/partition/table, you are spreading your read load as 1/3 of original..!!

I'll be hosting a dedicated Live Session to make you understand how to use Sharding and Replication in your System Design interviews..!! For that, join the waitlist on Eazy Develop now..!

Conclusion

You should understand the main different between Horizontal Partitioning, Sharding and Replication..!! We'll talk about Vertical Partitioning later on..!

If you like this short, sweet story - do Like and Share this article on LinkedIn ??

System Design & Architecture

47,583 位关注者

Sakalya Deshpande

2 年

Most of the times "Partition" and "Shard" are used interchangeably. Each data store has its own naming convention for it. For example, MongoDb, ElasticSearch use "Shard", Cassandra uses "Vnode", Hbase uses "Region" and so on.

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Horizontal Partitioning, Scaling and Sharding - with Real Examples

Shrey Batra

Founder @ Cosmocloud ◆ Ex-LinkedIn ◆ Angel Investor ◆ MongoDB Champion ◆ Book Author ◆ Patent Holder (Distributed Algorithm)

What is Horizontal Partitioning?

Horizontally Partitioning an SQL Table

Horizontally Partitioning a Kafka Topic

Horizontally Partitioning a Backend API Service

领英推荐

What is Sharding?

So what is Horizontal Scaling?

How is Replication different?

Can you have both Sharding and Replication?

How does Replication help in Scalability then?

Conclusion

System Design & Architecture

47,583 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Why Postgres Stands Out Among Relational Databases

The Latest in Distributed SQL - September

Building Your First Database - Part 2

Optimizing Apache Iceberg: Unlocking High Performance Across Platforms

Designing a Scalable Reporting Service Under Constraints

Hasura GraphQL Remote JOINs on Distributed SQL Running on AKS & GKE

Transitioning from SQL to NoSQL: Challenges and Opportunities

An Approach to Database Fine-Grained Access Controls

Deploying a Real-Time Location App with Hasura GraphQL Engine and Distributed SQL

Exploring the Types of Databases and the Essential Software Tools

What is Horizontal Partitioning?

Horizontally Partitioning an SQL Table

Horizontally Partitioning a Kafka Topic

Horizontally Partitioning a Backend API Service

领英推荐

What is Sharding?

So what is Horizontal Scaling?

How is Replication different?

Can you have both Sharding and Replication?

How does Replication help in Scalability then?

Conclusion

System Design & Architecture

47,583 位关注者

E02 - Finding the best Devops & PaaS Platforms - AWS ECS

2024年10月24日

SMILe and Cosmocloud partners together: Transforming Logistics with Tech-Driven Operations

2024年9月19日

Building a Custom Link-Clicks Tracking System

2024年8月23日

Databases & Platform Mentorship Program

2024年8月21日

Building your own Event Tracking System

2024年8月10日

Hiring Tech Lead @ Cosmocloud

2024年7月24日

Moneyvest builds its Investment Platform with Cosmocloud's no-code backend

2024年7月11日

Too many microservices?

2024年6月24日

No-Code Backend - Demo

2024年5月29日

Cosmocloud Product Demo

2024年5月23日

社区洞察

其他会员也浏览了

Why Postgres Stands Out Among Relational Databases

The Latest in Distributed SQL - September

Building Your First Database - Part 2

Optimizing Apache Iceberg: Unlocking High Performance Across Platforms

Designing a Scalable Reporting Service Under Constraints

Hasura GraphQL Remote JOINs on Distributed SQL Running on AKS & GKE

Transitioning from SQL to NoSQL: Challenges and Opportunities

An Approach to Database Fine-Grained Access Controls

Deploying a Real-Time Location App with Hasura GraphQL Engine and Distributed SQL

Exploring the Types of Databases and the Essential Software Tools