Data Grouping and Aggregations in Pandas: Unlock Actionable Insights from Your Data
Grouping and aggregations are essential techniques in data analysis, enabling us to segment and summarize datasets efficiently. With Pandas, you can calculate total sales per representative, average commission per model, and more, making it a powerful tool for data-driven decision-making.
In this guide, we will cover:
By the end, you’ll be able to use these techniques to analyze large datasets with ease.
Why This Module is Important
Grouping and aggregations allow you to:
Without proper grouping and aggregation, analyzing large datasets becomes complex and inefficient.
You can download the datasets from the following GitHub link: GitHub Datasets
Understanding Grouping and Aggregations in Pandas
Why Do We Need Grouping?
Raw data consists of thousands of records that need segmentation and summarization for analysis. Grouping enables us to:
How to Perform Grouping and Aggregations
1. Grouping Data by Sales Representative
To analyze total sales per representative, we use groupby() and sum():
import pandas as pd
# Load the Toyota sales dataset
toyota_sales_data = pd.read_csv("data/car_sales/toyota_sales_data.csv")
# Group by sales rep ID and sum the sale amount
toyota_sales_data.groupby("sale_rep_id")["sale_amount"].sum()
Explanation:
2. Getting Unique Sales Rep IDs
Before performing a group-by operation, it's useful to check unique values in the dataset:
unique_sales_reps = toyota_sales_data["sale_rep_id"].unique()
print(unique_sales_reps)
If the dataset contains 20 sales reps and 5,000 sales transactions, the grouped results will contain 20 records, one per representative.
3. Calculating the Average Sale Amount per Car Model
To analyze average sales per car model, use mean():
toyota_sales_data.groupby("car_model")["sale_amount"].mean()
toyota_sales_data["sale_status"].unique()
Explanation:
领英推荐
4. Applying Multiple Aggregations with agg()
To calculate both total and average sales per sales rep, use agg():
toyota_sales_data.groupby("sale_rep_id")["sale_amount"]. \
agg(["sum", "mean"])
5. Grouping Data by Sale Status
To analyze total and average sales by sale status (Completed, Pending, Canceled):
sales_by_status = toyota_sales_data.groupby("sale_status")["sale_amount"].agg(["sum", "mean"])
print(sales_by_status)
Explanation:
Exercise for You
Try this:
Here’s a hint:
car_model_summary = toyota_sales_data.groupby("car_model")["sale_amount"].agg(["sum", "min", "max"])
print(car_model_summary)
Practice Assignment
?? Want to practice? Attempt the Grouping and Aggregations in Pandas Assignment
?? Click Here.
?? Need help? Leave a comment, and we’ll assist you!
What’s Next?
Now that we understand the basics of grouping and aggregations, we’ll move on to Advanced Aggregations using Pandas. We'll also explore renaming aggregated columns for better clarity and handling missing data in grouped results. By the end of this module, you will master data summarization techniques for business reporting and analytics.
Click ?? to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.
Conclusion
In this guide, we covered:
Grouping and aggregation are fundamental data processing techniques that enable businesses to analyze trends, make data-driven decisions, and automate reporting workflows.
?? Engage With Us!
? Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa
?? Share this article with anyone interested in data engineering, Python, or data analysis. ?? Have questions or need help? Comment below! Let's discuss.
?? Follow us for more hands-on data science tutorials!