All about NoSQL - MongoDB
Varshini G
Sr. Quality Assurance Engineer/Sr. Scrum Master, currently seeking for new opportunities QA professional or AI (No Sponsorship Required)
1) What is NoSQL?
NoSQL (Not Only SQL) is a type of database that provides a flexible alternative to traditional relational databases. It is designed to handle unstructured, semi-structured, and structured data efficiently. NoSQL databases are widely used in big data applications, real-time analytics, and cloud-based systems.
2) Why MongoDB?
MongoDB is one of the most popular NoSQL databases, known for its:
Document-oriented storage (uses JSON-like BSON format)
Schema flexibility (dynamic schemas)
Scalability (horizontal scaling using sharding)
High performance (fast reads/writes)
Rich query language (support for indexing, aggregation, and geospatial queries)
3) Key Features of MongoDB:
Collections and Documents: Instead of tables and rows, MongoDB stores data in collections and documents.
Flexible Schema: No predefined schema, allowing varied structures.
Indexing: Supports various types of indexes for faster queries.
Replication: Ensures high availability via replica sets.
Sharding: Distributes data across multiple servers for scalability.
Aggregation Framework: Enables complex data processing.
4) When to Use NoSQL?
? High scalability needed
? Unstructured or semi-structured data
? High read/write throughput
? Real-time big data processing
? Flexible schema requirements
5) MongoDB Deployment Options
6) MongoDB vs. SQL Databases
7) Basic MongoDB Operations
7.1) Create/Select Database
use myUser
db.createCollection("myUser")
7.2) Inserting Document(s)
db.myUser.insertOne({name: "Alice", age: 25, city: "NY"}) // this will let you insert single document
db.myUser.insertMany([ { name: "Alice", age: 25,city: "NY" }, { name: "Bob", age: 28,city: "CA" } ]); // this will let you insert multiple documents a once
7.3) Retrieving Document(s)
db.myUser.find(); // Fetch all documents
db.myUser.find({ age: { $gt: 25 } }); // Fetch data using operators
db.myUser.findOne({ name: "John Doe" }); // Fetch a single document from the collection based on given field attribute or any other attribute
7.4) Updating Document(s)
db.myUser.updateOne({ name: "John Doe" }, { $set: { age: 31 } }); // For updating a specific field attribute
?db.myUser.updateMany({}, { $set: { status: "active" } }); // Updates all documents in the collection
7.5) Deleting Document(s)
db.myUser.deleteOne({ name: "Alice" }); // Delete single document
db.myUser.deleteMany({ age: { $lt: 27 } }); // Deletes all documents based on the condition that you mentioned
7.6) Indexing for Performance
db.myUser.createIndex({ email: 1 }); // Creates an index on the email field
8) Aggregation in MongoDB
The aggregation framework in MongoDB provides powerful data processing capabilities, similar to SQL's GROUP BY, COUNT(*), and aggregate functions. It allows grouping, filtering, sorting, transforming, and computing values across collections.
8.1) Aggregation Pipeline
The pipeline consists of a series of stages where each stage processes input documents and passes the output to the next stage.
8.2) Common Aggregation Stages
$match – Filters documents based on conditions (similar to find).
$group – Groups documents by a specified field and performs aggregation (sum, avg, count, etc.).
$project – Reshapes documents by including/excluding fields or computing new ones.
$sort – Sorts documents based on a specified field.
$limit – Limits the number of documents.
$skip – Skips a specified number of documents.
$unwind – Deconstructs an array field into multiple documents.
$lookup – Performs a left outer join with another collection.
$addFields – Adds new fields to documents.
$facet – Runs multiple aggregation pipelines in a single query.
Example For Aggregation Pipeline:
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customerId", totalAmount: { $sum: "$amount" } } },
{ $sort: { totalAmount: -1 } },
{ $limit: 5 }
])
Explanation for above aggregate example:
Above we are filtering orders with status as "shipped" by grouping orders by customerId and sums the amount. Then we are sorting by totalAmount in descending order and we are only displaying top 5 customers using Limit.
Performance Considerations:
8.3) Here’s an advanced example demonstrating multiple aggregation stages in MongoDB.
Scenario: Let's say we have a sales collection with the following structure. Now we want to filter only completed sales and Unwind the items array to separate items. Calculate total revenue per product. Sort products by total revenue and show only the top 5 products.
Query be like:
db.sales.aggregate([
{ $match: { status: "completed" } },
{ $unwind: "$items" },
{
$group: {
_id: "$items.product",
totalRevenue: { $sum: { $multiply: ["$items.quantity", "$items.price"] } },
totalUnitsSold: { $sum: "$items.quantity" }
}
},
{ $sort: { totalRevenue: -1 } },
{ $limit: 5 }
])
Explanation for above query:
totalRevenue = quantity * price
totalUnitsSold = sum of quantities sold.
Output for above query be like:
[
{ "_id": "Laptop", "totalRevenue": 50000, "totalUnitsSold": 50 },
{ "_id": "Mouse", "totalRevenue": 1500, "totalUnitsSold": 60 },
{ "_id": "Keyboard", "totalRevenue": 1200, "totalUnitsSold": 40 }
]
8.4) Let's explore $lookup, which is MongoDB's equivalent of SQL JOIN, using an advanced example.
Scenario: We have two collections, one is Customers Collection and other is Orders Collection. Now we want to retrieve all customers along with their orders, showing total order value per customer.
customersCollection contains:
{
"_id": ObjectId("C001"),
"name": "Stacy Doe",
"email": "[email protected]"
}
ordersCollection contains:
{
"_id": ObjectId("O1001"),
"customerId": ObjectId("C001"),
"items": [
{ "product": "Laptop", "quantity": 1, "price": 1200 },
{ "product": "Mouse", "quantity": 2, "price": 30 }
],
"orderDate": ISODate("2024-03-10T12:00:00Z"),
"status": "shipped"
}
Query using $lookup be like:
db.customers.aggregate([
{
$lookup: {
from: "orders", // The collection to join
localField: "_id", // The field from 'customers'
foreignField: "customerId", // The field from 'orders'
as: "customerOrders" // Output array field
}
},
{
$unwind: "$customerOrders" // Flatten the array to process each order separately
},
{
$group: {
id: "$id",
name: { $first: "$name" },
email: { $first: "$email" },
totalSpent: {
$sum: {
$sum: {
$map: {
input: "$customerOrders.items",
as: "item",
in: { $multiply: ["$$item.quantity", "$$item.price"] }
}
}
}
}
}
},
{ $sort: { totalSpent: -1 } } // Sort by highest spending customers
])
Explanation of above query:
1?? $lookup: Joins customers with orders on _id = customerId.
2?? $unwind: Expands the customerOrders array to process each order separately.
3?? $group: Groups by customerId, calculates:
totalSpent: Multiplies quantity * price for each item in an order and sums them.
4?? $sort: Orders customers by total spending in descending order.
Output for above query be like:
[
{
"_id": "C001",
"name": "Stacy Doe",
"email": "[email protected]",
"totalSpent": 1260
}
]
8.5) Filtering Orders by Date
Scenario: get customers who placed orders in the last 30 days, add a $match stage:
Query be like:
db.customers.aggregate([
{
$lookup: {
from: "orders", // The collection to join
localField: "_id", // The field from 'customers'
foreignField: "customerId", // The field from 'orders'
as: "customerOrders" // Output array field
}
},
{ $match: { "customerOrders.orderDate": { $gte: ISODate("2024-02-10T00:00:00Z") } }
},
{
$unwind: "$customerOrders" // Flatten the array to process each order separately
},
{
$group: {
id: "$id",
name: { $first: "$name" },
email: { $first: "$email" },
totalSpent: {
$sum: {
$sum: {
$map: {
input: "$customerOrders.items",
as: "item",
in: { $multiply: ["$$item.quantity", "$$item.price"] }
}
}
}
}
}
},
{ $sort: { totalSpent: -1 } } // Sort by highest spending customers
])
9) Query Operators
$gt Greater than Eg: { age: { $gt: 25 } }
$lt Less than Eg: { age: { $lt: 30 } }
$gte Greater than or equal Eg: { age: { $gte: 18 } }
$lte Less than or equal Eg: { age: { $lte: 60 } }
$eq Equals Eg: { city: { $eq: "NY" } }
$ne Not equals Eg: { city: { $ne: "LA" } }
$in Matches any in array Eg: { city: { $in: ["NY", "LA"] } }
$nin Not in array Eg: { city: { $nin: ["NY", "LA"] } }
$exists Field exists or not Eg: { state: { $exists: true } }
10) Array Operators
11) Transactions (MongoDB 4.0+)
In MongoDB 4.0+, multi-document transactions were introduced, allowing atomic operations across multiple documents and collections within a single replica set. This was extended to sharded clusters in MongoDB 4.2.
Key Concepts of Transactions in MongoDB
Sample Query:
const session = db.getMongo().startSession();
try {
session.startTransaction();
const usersCollection = session.getDatabase("mydb").users;
const ordersCollection = session.getDatabase("mydb").orders;
usersCollection.updateOne({ _id: 1 }, { $set: { balance: 500 } }, { session });
ordersCollection.insertOne({ orderId: 101, amount: 500 }, { session });
session.commitTransaction();
} catch (error) {
print("Transaction failed: " + error);
session.abortTransaction();
} finally {
session.endSession();
}
Key Notes
12) User & Role Management
User and Role Management in MongoDB involves defining users, assigning roles, and managing permissions to ensure secure access control. MongoDB uses Role-Based Access Control (RBAC) to manage user privileges.
Sample Query:
db.createUser({
user: "admin",
pwd: "password123",
roles: [{ role: "readWrite", db: "myDatabase" }]
})
db.getUsers() # List users
db.dropUser("admin") # Delete user
13) Backup & Restore
Backing up and restoring MongoDB is essential for data recovery, disaster management, and migrations. MongoDB provides several methods to back up and restore databases, including mongodump/mongorestore, file system snapshots, and oplog backups.
Sample query:
# Backup MongoDB Database
mongodump --db=myDatabase --out=/backup/
# Restore MongoDB Database
mongorestore --db=myDatabase /backup/myDatabase/
Conclusion:
When to Use MongoDB?
? Schema-less & flexible data
? High write throughput
? Real-time big data applications
? Scaling horizontally (sharding)
? JSON-like document storage