Understanding Schema Design Anti-Patterns in MongoDB

Understanding Schema Design Anti-Patterns in MongoDB

Here are some common anti-patterns in MongoDB schema design, along with their reasons and solutions.

1. Massive Arrays

Storing massive, unbounded arrays inside MongoDB documents is generally bad practice. In MongoDB, the maximum size of a document is limited to 16MB, and unbounded arrays can sometimes lead to spillovers. Additionally, reading and building indexes on arrays become less performant as the array size increases.

Extended Reference Pattern: A mixture of embedding and referencing, where we only duplicate the data that is frequently accessed together. This approach reduces the cost associated with the $lookup operation.


Eg: Unbounded Employees array.

2. Massive Number of Collections

MongoDB automatically creates an index on the _id field. While the size of this index is small for empty or small collections, thousands of empty or unused indexes can begin to drain resources. Collections typically require a few more indexes to support efficient queries, and these indexes add up.

In general, MongoDB recommends limiting collections to 10,000 per replica set. When users begin exceeding 10,000 collections, they typically see decreases in performance.

3. Unnecessary Indexes

Indexes allow MongoDB to efficiently query data. If a query does not have an index to support it, MongoDB performs a collection scan, which can be very slow. However, indexes take up space. Each index is at least 8KB and grows with the number of associated documents. Thousands of indexes can begin to drain resources.

Indexes can impact the storage engine's performance. The WiredTiger storage engine stores a file for each collection and each index. WiredTiger opens all files upon startup, so performance will decrease when there is an excessive number of collections and indexes.

Indexes can also impact write performance. Whenever a document is created, updated, or deleted, any associated indexes must also be updated, negatively affecting write performance. In general, MongoDB recommends limiting a collection to a maximum of 50 indexes.

To avoid the anti-pattern of unnecessary indexes, identify which indexes are truly necessary. Unnecessary indexes typically fall into one of two categories: a. Rarely used or not used at all. b. Redundant because another compound index covers it.

4. Bloated Documents

One of the rules in MongoDB schema design is that data accessed together should be stored together. However, this doesn't mean that data related to each other should always be stored together. Data that is related isn't necessarily accessed together frequently. We might have large, bloated documents containing related information that isn't accessed often. In such cases, separate the information into smaller documents in separate collections and use references to connect them.

The WiredTiger storage engine keeps frequently accessed indexes and documents in memory. When the working set fits in the RAM allotment, MongoDB can query from memory instead of from disk. Queries from memory are faster, so the goal is to keep your most popular documents small enough to fit in the RAM allotment. The working set's RAM allotment is the largest of 50% of (RAM - 1 GB) or 256 MB.



Bloated document


5. Separating Data That is Accessed Together

Normalizing data to optimize for space and reduce data duplication can feel like second nature to those with a relational database background. However, separating data that is frequently accessed together is an anti-pattern in MongoDB. MongoDB has a $lookup operation that allows you to join information from more than one collection. $lookup is great for infrequent, rarely used operations or analytical queries that can run overnight without a time limit. However, $lookup operations are slow and resource-intensive compared to operations that don't need to combine data from multiple collections. Instead of separating data that is frequently used together into multiple collections, leverage embedding and arrays to keep the data together in a single collection. Consider using the Subset Pattern and Extended Reference Pattern to optimize data access.


Data that is accessed together but in different collections.


6. Case-Insensitive Queries Without Case-Insensitive Indexes

Using $regex with the i option is not performant because $regex cannot fully utilize case-insensitive indexes. Create a case-insensitive index with a collation strength of 1 or 2, and specify that your query uses the same collation. Set the default collation strength of your collection to 1 or 2 when you create it, and avoid specifying a different collation in your queries and indexes.


Credits: Lauren Schaefer and Daniel Coupal, MongoDB

Refer to the original article: https://www.mongodb.com/developer/products/mongodb/schema-design-anti-pattern-summary/


#MongoDB #BestPractices #Database #DesignPatterns #SystemDesign #SoftwareArchitecture #Scalability


Very useful article.. Thanks for sharing..

要查看或添加评论,请登录

社区洞察

其他会员也浏览了