登录查看更多内容

What is the Sharding and Find appropriate sharding key in case of ride service.

Davinder Singh

Head of SRE at Gojek

发布日期: 2024年7月2日

Sharding is a database partitioning technique where large datasets are divided into smaller, more manageable pieces, called “shards,” which are distributed across multiple database servers. Each shard contains a subset of the data, which can help improve performance, scalability, and availability by spreading the load across multiple servers.

In the context of a ride service, choosing the right sharding key is crucial for optimizing performance and ensuring efficient data distribution. Here are some commonly considered sharding keys for a ride service and their pros and cons:

1. User ID

Pros:

? Even distribution if users are evenly active.

? Simplifies user-specific data retrieval.

Cons:

? Imbalanced load if some users are significantly more active.

? Not optimal for location-based queries.

2. Geographic Location

Pros:

? Logical grouping of data based on regions.

? Optimizes location-based queries (e.g., finding nearby drivers).

Cons:

? Uneven distribution if some regions have more users or activity.

? More complex to manage and re-shard as geographic boundaries change.

3. Ride ID

Pros:

? Typically unique and evenly distributed.

? Useful for ride-specific queries.

Cons:

? May not optimize user or location-based queries.

? Sharding during the ride lifecycle can be complex.

领英推荐

Data Structures powering our Database Part-3 | B-Trees

Saurav Prateek 2 年前

Data Structures powering our Database Part-2 |…

Saurav Prateek 2 年前

Designing Time-Series Data In DynamoDB

Uriel Bitton 1 周前

4. Driver ID

Pros:

? Even distribution if drivers are evenly active.

? Useful for driver-specific data management.

Cons:

? Imbalanced load if some drivers are significantly more active.

Recommended Sharding Key: Geographic Location

For a ride service, geographic location is often an effective sharding key due to the following reasons:

1. Geographic Queries: Many operations in a ride service are location-based, such as matching drivers to riders, calculating ETAs, and optimizing routes.

2. Load Balancing: Sharding by geographic location can help distribute the load evenly across different regions, assuming some mechanisms are in place to handle varying densities of activity.

3. Operational Efficiency: It enhances the efficiency of operations that are sensitive to location, which is critical for real-time ride matching and routing.

Example of Geographic Sharding

1. Dividing by Region: Split the service area into regions (e.g., cities, states, or predefined grid cells).

2. Shard Allocation: Assign specific regions to different database shards. For instance:

? Shard 1: North Region

? Shard 2: South Region

? Shard 3: East Region

? Shard 4: West Region

3. Data Distribution: Store data related to users, drivers, and rides in their corresponding regional shards.

Considerations for Geographic Sharding

Hotspots: Some regions may have higher activity, leading to potential hotspots. Implementing dynamic sharding or load balancing mechanisms can help mitigate this.

Geographic Changes: Be prepared to handle changes in geographic boundaries and re-shard data as needed.

要查看或添加评论，请登录

Davinder Singh的更多文章

Load Average vs. CPU Utilization in Linux

2025年3月27日

Load Average vs. CPU Utilization in Linux

Both load average and CPU utilization are metrics used to assess system performance, but they measure different things.…

1 条评论
Increase load and How to Test load on a Linux system:

2025年3月27日

Increase load and How to Test load on a Linux system:

CPU Load Testing stress: A tool that generates a specified amount of load on the CPU. Example: stress -c 4 -t 60…

1 条评论
What is Load in Linux

2025年3月27日

What is Load in Linux

In Linux, load refers to the amount of work that the system is handling at a given time. Here are some ways to define…

2 条评论
Embracing SRE Principles: Building Reliable and Efficient Systems

2023年6月20日

Embracing SRE Principles: Building Reliable and Efficient Systems

I'm thrilled to share my insights on Site Reliability Engineering (SRE) principles and their significant impact on…
Kubernetes Configure Burstable Quality of Service (QOS) Class for Pods

2023年1月28日

Kubernetes Configure Burstable Quality of Service (QOS) Class for Pods

Set resource limits and requests for your pods: By setting resource limits and requests, you can control how much CPU…

1 条评论
Strategy to implement an SRE program

2023年1月18日

Strategy to implement an SRE program

Define SRE objectives and goals: Clearly define the objectives and goals of the SRE program, including what it aims to…
Kubernetes Pod-to-Pod Communication

2023年1月17日

Kubernetes Pod-to-Pod Communication

In Kubernetes, pods are the basic building blocks of a cluster and they are used to group one or more containers…

1 条评论
How to delete docker images and Docker Machines

2017年6月15日

How to delete docker images and Docker Machines

admins-MacBook-Pro:~ admin$ docker images -a REPOSITORY TAG IMAGE ID CREATED SIZE ubuntu 17.04 49d40961099d 12 days ago…
Docker Basic Commands to start containers.

2017年6月15日

Docker Basic Commands to start containers.

Q: Pull first docker image admins-MacBook-Pro:~ admin$ docker pull centos Using default tag: latest latest: Pulling…

1 条评论

See all articles

What is the Sharding and Find appropriate sharding key in case of ride service.

Davinder Singh

Head of SRE at Gojek

领英推荐

Davinder Singh的更多文章

社区洞察

其他会员也浏览了

Designing Time-Series Data In DynamoDB

Transform Your Data with Global Data Services

DeltaJSON Updates

The Bare Minimum of Metadata For Any Data

Brinks Home's Strategic IT Initiatives: Quarterly Highlights (Part 1 of 5) Microsoft Fabric Data Lake

Expanding Our Self-Serve Offering: Meet the HERE Data Hub Add-On

Data Sharding vs. Partitioning: Breaking Down the Basics

Data Vault Constructs: Other Optional Constructs (Modern Data Warehousing, Part 7)

HOOK vs Data Vault: Willibald Part 6

领英推荐

Davinder Singh的更多文章

Load Average vs. CPU Utilization in Linux

Increase load and How to Test load on a Linux system:

What is Load in Linux

Embracing SRE Principles: Building Reliable and Efficient Systems

Kubernetes Configure Burstable Quality of Service (QOS) Class for Pods

Strategy to implement an SRE program

Kubernetes Pod-to-Pod Communication

How to delete docker images and Docker Machines

Docker Basic Commands to start containers.

社区洞察

其他会员也浏览了

Designing Time-Series Data In DynamoDB

Transform Your Data with Global Data Services

DeltaJSON Updates

The Bare Minimum of Metadata For Any Data

Brinks Home's Strategic IT Initiatives: Quarterly Highlights (Part 1 of 5) Microsoft Fabric Data Lake

Expanding Our Self-Serve Offering: Meet the HERE Data Hub Add-On

Data Sharding vs. Partitioning: Breaking Down the Basics

Data Vault Constructs: Other Optional Constructs (Modern Data Warehousing, Part 7)

HOOK vs Data Vault: Willibald Part 6