Advanced Custom Aggregation Functions in Pandas
Master Multi-Step Calculations for Data Analysis
In this lecture, we will explore advanced custom aggregation functions in Pandas. These functions allow us to perform multi-step calculations, such as ratios, percentages, and conditional aggregations, which built-in functions cannot handle.
By the end of this article, you’ll learn:
Let’s dive in!
Why This Module is Important
Advanced custom aggregation functions are crucial when:
You can download the datasets from the following GitHub link: GitHub Datasets
Step 1: Load the Dataset
Let’s start by loading the dataset:
import pandas as pd
# Load the dataset
toyota_sales_data = pd.read_csv("data/car_sales/toyota_sales_data.csv")
# Preview the dataset
print(toyota_sales_data.head())
Step 2: Creating an Advanced Custom Aggregation Function
Problem Statement:
We want to calculate the percentage of completed sales for each Sales Rep ID.
Let’s define a function that computes completed sales percentage:
# Define a custom function to calculate completed sales percentage
def completed_sales_percentage(sale_status_series):
"""Calculate the percentage of completed sales."""
total_sales = len(sale_status_series)
completed_sales = (sale_status_series == "Completed").sum()
return (completed_sales / total_sales) * 100
Step 3: Testing the Function on a Single Sales Rep ID
To validate our function, let’s filter data for a specific Sales Rep (ID: 8) and apply the function:
# Filter data for Sales Rep ID 8
sales_by_rep_8 = toyota_sales_data.query("sale_rep_id == 8")
# Apply the function to calculate completed sales percentage
completed_pct = completed_sales_percentage(sales_by_rep_8["sale_status"])
print(completed_pct)
Step 4: Applying the Custom Function Across All Sales Reps
Now, let’s compute completed sales percentage for all Sales Reps using groupby() and agg():
# Apply the custom function to calculate completed sales percentage for all sales reps
completed_sales_agg = toyota_sales_data.groupby("sale_rep_id")["sale_status"].agg(
total_sales="count",
completed_sales_pct=completed_sales_percentage
)
# Display the results
print(completed_sales_agg)
This table shows:
领英推荐
Step 5: Combining Custom Functions with Built-in Aggregations
Now, let’s calculate three metrics for each Sales Rep:
# Compute multiple metrics with built-in and custom aggregations
sales_rep_summary = toyota_sales_data.groupby("sale_rep_id").agg(
total_sales=("sale_amount", "sum"),
total_sales_count=("sale_id", "count"),
completed_sales_pct=("sale_status", completed_sales_percentage)
)
# Display the results
print(sales_rep_summary)
Explanation:
Best Practices for Using Advanced Custom Aggregation Functions
Practice Assignment
?? Want to practice? Attempt the Implementing Complex Custom Aggregation Functions in Pandas Assignment?? Click Here.
?? Need help? Leave a comment, and we’ll assist you!
What’s Next?
In the next lecture, we will explore Mastering Row-Level Transformations in Pandas with apply(). This is a crucial technique for performing complex transformations and calculations on individual rows of a DataFrame.
Click ?? to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.
Conclusion
In this article, you learned:
Custom aggregations help unlock deeper insights in data analysis. Mastering them will take your data transformation skills to the next level!
?? Engage With Us!
? Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa
?? Share this article with anyone interested in data engineering, Python, or data analysis. ?? Have questions or need help? Comment below! Let's discuss.
?? Follow us for more hands-on data science tutorials!