Unlocking Data Secrets with MongoDB Aggregation Pipelines: Day 2 Adventures in Grouping
Abdullah Rizwan
Data Scientist | Ai/ML | Google certified data analyst | PostgreSQL | Power BI | Fastapi | Python | API Architect | Backend Developer
Day two of my MongoDB aggregation pipeline journey was all about grouping! Grouping allows you to categorize documents based on shared characteristics, enabling you to analyze and summarize data in powerful ways. It's particularly useful when you want to calculate aggregate values for distinct subsets of your data.
Today, I tackled two practical examples. First, I wanted to find the average age of all users in my database. The aggregation pipeline I used is straightforward:
The $group stage essentially buckets all documents together and calculates the average age ($avg) for the combined "age" field across all documents. This gives us the overall average age for the entire user base.
Next, I wanted to see how average age varies by gender. Here, I grouped documents by gender ($gender) and then calculated the average age within each group:
This reveals interesting insights, potentially showing if there's a significant difference in average age between male and female users.
Furthermore, I explored finding the most popular favorite food among users. This involved grouping documents by favorite fruit ($favoriteFruit) and then counting the occurrences ($sum: 1) within each group. Finally, sorting by the count in descending order ($sort) revealed the top favorite food:
Finally, our data exploration doesn't stop at age and appetite! The power of grouping extends far and wide. I was curious about the global reach of my user base, so I embarked on a mission to discover the countries with the most registered users. Using the $company.location.country field as our grouping key, I unleashed the $group and $sort operators. This dynamic duo counted occurrences and presented the top 5 contenders (although, you can adjust the $limit value to peek at even more!).