How do I model a one-to-N relationship in MongoDB?
When designing a MongoDB schema, it needs to start with a question: is it “one-to-few,” “one-to-many,” or “one-to-squillions” relationship? Depending on which one it is, you’d use a different format to model the relationship.
Basics
-> Modeling one-to-few:
An example of “one-to-few” might be the addresses for a person. This is a good use case for embedding. You’d put the addresses in an array inside your Person object:
The main advantage is that you don’t have to perform a separate query to get the embedded details; the main disadvantage is that you have no way of accessing the embedded details as stand-alone entities.
-> Modeling one-to-many:
An example of “one-to-many” might be parts for a product in a replacement parts ordering system. Each product may have up to several hundred replacement parts, but never more than a couple of thousand or so. You’d put the ObjectIDs of the parts in an array in the product document.
Each Product would have its own document, which would contain an array of ObjectID references to the Parts that make up that Product:
You would then use an application-level join to retrieve the parts for a particular product:
For efficient operation, you’d need to have an index on "products.catalog_number." Note that there will always be an index on "parts._id," so that query will always be efficient.
Each Part is a stand-alone document, so it’s easy to search them and update them independently. One trade off for using this schema is having to perform a second query to get details about the Parts for a Product.
领英推荐
-> Modeling one-to-squillions:
An example of “one-to-squillions” might be an event logging system that collects log messages for different machines. In this case will be needed to have a document for the host, and then store the ObjectID of the host in the documents for the log messages.
Based on these factors, you can pick one of the three basic One-to-N schema designs:
Intermediate
-> Two-way referencing:
If you want to get a little bit fancier, you can combine the two techniques and include both styles of reference in your schema, having both references from the “one” side to the “many” side and references from the “many” side to the “one” side.
-> Database denormalization:
Denormalization is the process of duplicating fields or deriving new fields from existing ones. Denormalized databases can improve read performance and query performance in a variety of cases, such as:
However, if you’ve denormalized the Product name into the Part document, then when you update the Product name you must also update every place it occurs in the ‘parts’ collection to avoid data anomalies. This is likely to be a more expensive update, since you’re updating multiple Parts instead of a single Product.?