登录查看更多内容

Architecture Talk-1: MongoDB - Sharding Architecture

Karthikeyan Thanikachalam

Aspiring Head of Data & AI Platform | Author | Generative AI Evangelist| Senior Data Architect | Cloud Migration Specialist | Cloud Certified Professional - 5x | Teradata Vantage | GCP | Azure | AWS | GenAI | AI & ML

发布日期: 2023年10月11日

+ 关注

In MongoDB, a sharded cluster consists of:

Shards
Mongos
Config servers

A shard is a replica set that contains a subset of the cluster’s data.

The mongos acts as a query router for client applications, handling both read and write operations. It dispatches client requests to the relevant shards and aggregates the result from shards into a consistent client response. Clients connect to a mongos, not to individual shards.

Config servers are the authoritative source of sharding metadata. The sharding metadata reflects the state and organization of the sharded data. The metadata contains the list of sharded collections, routing information, etc.

In its simplest configuration (a single shard), a sharded cluster will look like this:

Sharding Benefits

Sharding allows you to scale your database to handle increased loads to a nearly unlimited degree. It does this by increasing read/write throughput, and storage capacity. Let’s look at each of those in a little more detail:

Increased read/write throughput: You can take advantage of parallelism by distributing the data set across multiple shards. Let’s say one shard can process one thousand operations per second. For each additional shard, you would gain an additional one thousand operations per second in throughput.

Increased storage capacity: Similarly, by increasing the number of shards, you can also increase overall total storage capacity. Let’s say one shard can hold 4TB of data. Each additional would increase your total storage by 4TB. This allows near-infinite storage capacity.

Data Locality: Zone Sharding allows you to easily create distributed databases to support geographically distributed apps, with policies enforcing data residency within specific regions. Each zone can have one or more shards.

Data Distribution

领英推荐

MongoDB: A Robust Solution for Transactional Use Cases

Ashok Gautam 1 年前

When And How To Use MongoDB For Distributed Database…

Mohit Shukla - Software Engineering Leader 3 周前

MongoDB indexing tutorial with examples

Farshad Tofighi 1 年前

Shard Key

MongoDB shards at the collection level. You choose which collection(s) you want to shard. MongoDB uses the shard key to distribute a collection’s documents across shards. MongoDB splits the data into “chunks”, by dividing the span of shard key values into non-overlapping ranges. MongoDB then attempts to distribute those chunks evenly among the shards in the cluster.

Shard keys are based on fields inside each document. The values in those fields will decide on which shard the document will reside, according to the shard ranges and amount of chunks. This data is stored and kept in the config server replica set.

The shard key has a direct impact on the cluster’s performance and should be chosen carefully. A suboptimal shard key can lead to performance or scaling issues due to uneven chunk distribution. You can always change your data distribution strategy by changing your shard key. Use the following documentation to choose the best shard key for you.

A background process known as the “balancer” automatically migrates chunks across the shards to ensure that each shard always has the same number of chunks.

Sharding Strategy

MongoDB supports two sharding strategies for distributing data across sharded clusters:

Ranged Sharding
Hashed Sharding

Ranged sharding divides data into ranges based on the shard key values. Each chunk is then assigned a range based on the shard key values.

A range of shard keys whose values are “close” are more likely to reside on the same chunk. This allows for targeted operations as a mongos can route the operations to only the shards that contain the required data.

Hashed Sharding involves computing a hash of the shard key field’s value. Each chunk is then assigned a range based on the hashed shard key values.

While a range of shard keys may be “close”, their hashed values are unlikely to be on the same chunk. Data distribution based on hashed values facilitates more even data distribution, especially in data sets where the shard key changes monotonically. However, hashed sharding does not provide efficient range-based operations.

要查看或添加评论，请登录

Karthikeyan Thanikachalam的更多文章

ZERO-ETL

2024年11月12日

ZERO-ETL

Zero-ETL is a set of integrations that eliminates or minimizes the need to build ETL data pipelines. Extract…
Generative AI Tools

2024年10月25日

Generative AI Tools

The branch of artificial intelligence known as "generative AI" is concerned with developing models and algorithms that…

2 条评论
6 GenAI Use cases - High Level Architecture (well Explained)

2024年10月23日

6 GenAI Use cases - High Level Architecture (well Explained)

(Reference - IBM & AWS GenAI Hackathon) Use case 1: Generative AI for change management A leading financial services…

2 条评论
Free AI Courses With Certificates For High-Income Skills In 2024

2024年9月2日

Free AI Courses With Certificates For High-Income Skills In 2024

Looking to boost your salary in 2024? AI might be the answer. A recent study by Amazon Web Services revealed that…

1 条评论
Architecture Talk-2: Teradata - Vantage Architecture (MPP - Massive Parallel Processing)

2023年10月24日

Architecture Talk-2: Teradata - Vantage Architecture (MPP - Massive Parallel Processing)

Teradata’s architecture is designed around a Massively Parallel Processing (MPP), shared-nothing architecture, which…
Bring your data to life with Microsoft generative AI

2023年10月9日

Bring your data to life with Microsoft generative AI

Generative AI has quickly become the generation-defining technology shaping how we search and consume information every…
SNOWPARK: BUILDING BETTER DATA PIPELINES AND MODELS IN THE DATA CLOUD

2023年10月9日

SNOWPARK: BUILDING BETTER DATA PIPELINES AND MODELS IN THE DATA CLOUD

Calling all data engineers, scientists, and developers! Want to execute pipelines feeding ML models and applications…
Generative AI on Google Cloud

2023年10月2日

Generative AI on Google Cloud

Develop AI-powered experiences with ease! Google Cloud offers a range of tools to bring generative AI to the real…
Generative AI - new GenAI innovations powered by AWS

2023年10月1日

Generative AI - new GenAI innovations powered by AWS

Generative AI is a fascinating type of machine learning. It uses ultra-large models, including large language models…

See all articles

Architecture Talk-1: MongoDB - Sharding Architecture

Karthikeyan Thanikachalam

Aspiring Head of Data & AI Platform | Author | Generative AI Evangelist| Senior Data Architect | Cloud Migration Specialist | Cloud Certified Professional - 5x | Teradata Vantage | GCP | Azure | AWS | GenAI | AI & ML

Sharding Benefits

Data Distribution

领英推荐

Shard Key

Sharding Strategy

Karthikeyan Thanikachalam的更多文章

社区洞察

其他会员也浏览了

MongoDB In 500 Words

MONGO DB

How Companies Using MongoDB?

Mastering Redis: The Ultimate Guide to Installation, Features, and Internals

Building with Patterns in MongoDB

MongoDB Real World Use Cases: Advantages & Top Companies [2022]

10 Reasons to Learn MongoDB for 2019

What is MongoDB and how it Work?

What is NeonDB? A Modern Cloud-native PostgreSQL Database

Here's what you missed on MongoDB

Sharding Benefits

Data Distribution

领英推荐

Shard Key

Sharding Strategy

Karthikeyan Thanikachalam的更多文章

ZERO-ETL

Generative AI Tools

6 GenAI Use cases - High Level Architecture (well Explained)

Free AI Courses With Certificates For High-Income Skills In 2024

Architecture Talk-2: Teradata - Vantage Architecture (MPP - Massive Parallel Processing)

Bring your data to life with Microsoft generative AI

SNOWPARK: BUILDING BETTER DATA PIPELINES AND MODELS IN THE DATA CLOUD

Generative AI on Google Cloud

Generative AI - new GenAI innovations powered by AWS

社区洞察

其他会员也浏览了

MongoDB In 500 Words

MONGO DB

How Companies Using MongoDB?

Mastering Redis: The Ultimate Guide to Installation, Features, and Internals

Building with Patterns in MongoDB

MongoDB Real World Use Cases: Advantages & Top Companies [2022]

10 Reasons to Learn MongoDB for 2019

What is MongoDB and how it Work?

What is NeonDB? A Modern Cloud-native PostgreSQL Database

Here's what you missed on MongoDB