登录查看更多内容

Boosting Database Efficiency: The Power of Partitioning and Sharding

Ahmed Adel

Senior Software Engineer at ITQAN Systems | JavaScript | NodeJS | React

发布日期: 2024年9月14日

Introduction

Partitioning and sharding are essential techniques for managing large databases. They address the need for improved performance, scalability, and availability by distributing data across multiple servers or splitting large tables into smaller, more manageable units.

Why Partitioning and Sharding?

1. Large Databases: Distributing extensive datasets across multiple servers enhances performance by allowing queries to target smaller datasets.

2. Large Tables: Techniques are needed to enhance query performance on large tables by scanning smaller segments.

3. Wide Tables: Some tables have numerous columns, and not all are accessed in every query. We need methods to optimize these queries.

4. Scalability and Availability: Enhancing the scalability, availability, and performance of the database is crucial.

Techniques for Improvement

To address these needs, options include:

Indexing: Improves query performance.
CQRS (Command Query Responsibility Segregation): Splits read and write operations.
Partitioning and Sharding: Distributing data for optimization.

Single server partitioning

Single server partitioning: in this technique the large tables is split into small ones on same server. And these can be done by split tables horizontally by split rows into multiple tables, each of them have small volume of data or split tables vertically by split tables with many columns into ones have small number of columns. And Some databases support this technique internally like postgreSQL.

Advantages

Performance: Enhances query execution time by limiting the scanned dataset.
Simplicity: Easier to implement.

Disadvantages

Scalability: Limited by the maximum resources of the server (vertical scaling).
Single point failure: all dataset in one server, this make this server is single point of failure, compared to multiple servers partitioning which distribute dataset across multiple servers and if one server is down, so this part of data is not accessible but the remaining system is working correctly.

Multiple Servers Partitioning

This technique distributes a large database across multiple servers, with each server holding a part of the database.

Advantages

Scalability: to scale the database, more servers will be added (horizontal scaling), the scaling is not limited by max server resources like single server partitioning).
Performance: enhance queries execution due to scanning small dataset and the load of queries is distributed across all servers.

Canonical 2 个月前

Difference between Database Sharding and Partitioning

Arpit Bhayani 2 年前

Database Scaling: 10 must-know strategies to scale…

Osama Ahmed 3 个月前

Disadvantages

Complexity: Increased complexity in application design.
Consistency: Maintaining data consistency is challenging.
Latency: Accessing multiple servers can increase latency.

Horizontal Partitioning

Data is partitioned at the row level, with each group of rows referred to as a partition. This can occur on a single server or across multiple servers.

Types of horizontal partitioning?

Range partitioning: the data is partitioned based on range of values.

Example?

In chat application with large number of messages, and most of users access most recent messages, so we can choose sent_at (date) column to be used in partitioning and choose one month as partitioning range. So according to sent_at, the message will be redirected to specific partition.

Hash partitioning: the data is partitioned randomly based on hash function?
List partitioning: the data is partitioned based on list of values.

Example?

In an application like Uber with large number of users and drivers in many countries, the list partitioning can be used to make users are partitioned based on their countries.?

Vertical partitioning

Vertical partitioning: in this technique large tables with many columns are partitioned in small tables with partitioned columns according to the accessing queries and which columns will be accessed. This technique communally used in single server.

Sharding?

it is type of multiple servers partitioning technique, in which the data is partitioned horizontally and distributed across servers and each server have same database scheme. Some of database support sharding natively for example Mongodb.

Challenges in implementing sharding?

the sharding key should be chosen carefully because this key will be used to distribute data on servers. And if the chosen key is wrong or ineffective, the related data will be on different servers and that make joining data is difficult and affect the performance. So we should choose sharding key carefully to make the related data or most of them on same server for example if customer id is chosen, all data of this customer should be on same server. Another example of sharding key is distributing data based on geo location.
another effect of choosing wrong sharding key is volume of data in each server. if some servers have large volume of data and others have small volume of data that affect on performance of queries due to large number of queries redirected to servers contains large volume of data and large dataset will be scanned. If this case occured, we should rebalance data across servers and this is complex and have high cost. So the sharding key should be chosen carefully.
the routing logic must be considered, this logic may be implemented in application or using external tools like Spock Proxy for MySQL
handling distributed transaction and grantee consistency of data between servers.

Conclusion

Partitioning and sharding enhance database performance, availability, and scalability. However, they also introduce additional complexity. It is essential to weigh the benefits against the complexities and explore other performance-enhancing techniques before implementation.

Boosting Database Efficiency: The Power of Partitioning and Sharding

Ahmed Adel

Senior Software Engineer at ITQAN Systems | JavaScript | NodeJS | React

Introduction

Why Partitioning and Sharding?

Techniques for Improvement

Single server partitioning

Advantages

Disadvantages

Multiple Servers Partitioning

Advantages

领英推荐

Disadvantages

Horizontal Partitioning

Types of horizontal partitioning?

Vertical partitioning

Sharding?

Challenges in implementing sharding?

Conclusion

更多精彩文章

社区洞察

其他会员也浏览了

Database Sharding

Replication: A Comprehensive Guide

Different Types of Databases: An Overview

?? Database Modernization: Navigating the Future of Data Management

Azure SQL Data Bases for Enterprise Applications

10 Best Practices for Optimizing Large Databases

Database Monitoring 2022

Demystifying Database Sharding: Revolutionizing Scalability in Distributed Systems

Introduction

Why Partitioning and Sharding?

Techniques for Improvement

Single server partitioning

Advantages

Disadvantages

Multiple Servers Partitioning

Advantages

领英推荐

Disadvantages

Horizontal Partitioning

Types of horizontal partitioning?

Vertical partitioning

Sharding?

Challenges in implementing sharding?

Conclusion

Database Selection Made Easy: Key Considerations for Developers

2024年9月4日

Mastering CQRS: When and How to Implement Command Query Responsibility Segregation

2024年8月9日

Mastering Microservice Patterns: Unlocking the Power of the API Gateway

2024年8月2日

Database: Optimistic locking and Pessimistic locking techniques

2024年4月27日

The Magic Behind JavaScript's Asynchronous Programming

2024年4月19日

JavaScript: Under The Hood - How JavaScript Scripts Run

2024年4月13日

Architecture Patterns: Monolithic Architecture vs Microservices Architecture

2024年4月11日

社区洞察

其他会员也浏览了

Database Sharding

Replication: A Comprehensive Guide

Different Types of Databases: An Overview

?? Database Modernization: Navigating the Future of Data Management

Azure SQL Data Bases for Enterprise Applications

10 Best Practices for Optimizing Large Databases

Database Monitoring 2022

Demystifying Database Sharding: Revolutionizing Scalability in Distributed Systems