Advanced Custom Aggregation Functions in Pandas

Advanced Custom Aggregation Functions in Pandas

Master Multi-Step Calculations for Data Analysis

In this lecture, we will explore advanced custom aggregation functions in Pandas. These functions allow us to perform multi-step calculations, such as ratios, percentages, and conditional aggregations, which built-in functions cannot handle.

By the end of this article, you’ll learn:

  • How to create advanced custom functions
  • How to calculate percentages within grouped data
  • How to integrate custom functions with built-in aggregations

Let’s dive in!



Why This Module is Important

Advanced custom aggregation functions are crucial when:

  • Built-in functions cannot perform the required calculation.
  • You need to calculate ratios, percentages, or conditional metrics.
  • Business logic requires complex aggregations within grouped data.


You can download the datasets from the following GitHub link: GitHub Datasets

Step 1: Load the Dataset

Let’s start by loading the dataset:

import pandas as pd

# Load the dataset
toyota_sales_data = pd.read_csv("data/car_sales/toyota_sales_data.csv")

# Preview the dataset
print(toyota_sales_data.head())        

Step 2: Creating an Advanced Custom Aggregation Function

Problem Statement:

We want to calculate the percentage of completed sales for each Sales Rep ID.

Let’s define a function that computes completed sales percentage:

# Define a custom function to calculate completed sales percentage
def completed_sales_percentage(sale_status_series):
    """Calculate the percentage of completed sales."""
    total_sales = len(sale_status_series)
    completed_sales = (sale_status_series == "Completed").sum()
    return (completed_sales / total_sales) * 100        

Step 3: Testing the Function on a Single Sales Rep ID

To validate our function, let’s filter data for a specific Sales Rep (ID: 8) and apply the function:

# Filter data for Sales Rep ID 8
sales_by_rep_8 = toyota_sales_data.query("sale_rep_id == 8")

# Apply the function to calculate completed sales percentage
completed_pct = completed_sales_percentage(sales_by_rep_8["sale_status"])

print(completed_pct)        

Step 4: Applying the Custom Function Across All Sales Reps

Now, let’s compute completed sales percentage for all Sales Reps using groupby() and agg():

# Apply the custom function to calculate completed sales percentage for all sales reps
completed_sales_agg = toyota_sales_data.groupby("sale_rep_id")["sale_status"].agg(
    total_sales="count",
    completed_sales_pct=completed_sales_percentage
)

# Display the results
print(completed_sales_agg)        

This table shows:

  • Total sales count for each Sales Rep.
  • Completed Sales Percentage computed using our custom function.


Step 5: Combining Custom Functions with Built-in Aggregations

Now, let’s calculate three metrics for each Sales Rep:

  1. Total Sales Amount
  2. Total Sales Count
  3. Completed Sales Percentage

# Compute multiple metrics with built-in and custom aggregations
sales_rep_summary = toyota_sales_data.groupby("sale_rep_id").agg(
    total_sales=("sale_amount", "sum"),
    total_sales_count=("sale_id", "count"),
    completed_sales_pct=("sale_status", completed_sales_percentage)
)

# Display the results
print(sales_rep_summary)        

Explanation:

  • total_sales: Sum of sale amounts for each Sales Rep.
  • total_sales_count: Number of sales transactions for each Sales Rep.
  • completed_sales_pct: Percentage of completed sales (custom function).


Best Practices for Using Advanced Custom Aggregation Functions

  • Use functions only when built-in methods fall short.
  • Keep calculations efficient to avoid performance bottlenecks.
  • Combine built-in and custom aggregations for comprehensive analysis.
  • Ensure functions handle missing values and edge cases correctly.


Practice Assignment

?? Want to practice? Attempt the Implementing Complex Custom Aggregation Functions in Pandas Assignment?? Click Here.

?? Need help? Leave a comment, and we’ll assist you!


What’s Next?

In the next lecture, we will explore Mastering Row-Level Transformations in Pandas with apply(). This is a crucial technique for performing complex transformations and calculations on individual rows of a DataFrame.


Click ?? to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.

Conclusion

In this article, you learned:

  • How to define advanced custom aggregation functions.
  • How to calculate multi-step metrics like completed sales percentage.
  • How to integrate custom and built-in functions for powerful insights.

Custom aggregations help unlock deeper insights in data analysis. Mastering them will take your data transformation skills to the next level!


?? Engage With Us!

? Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa

?? Share this article with anyone interested in data engineering, Python, or data analysis. ?? Have questions or need help? Comment below! Let's discuss.

?? Follow us for more hands-on data science tutorials!

要查看或添加评论,请登录

ITVersity, Inc.的更多文章

社区洞察

其他会员也浏览了