Monthly Sales Commission Analysis with Pandas - A Complete Workflow

Monthly Sales Commission Analysis with Pandas - A Complete Workflow

Optimizing Sales Performance: A Pandas-Based Sales Commission Workflow

Tracking monthly sales commissions is essential for evaluating sales team performance. In this guide, we will merge, clean, transform, aggregate, and export sales commission data using Pandas in a real-world workflow.

By the end of this guide, you'll learn:

  • How to merge multiple datasets
  • How to handle missing values
  • How to compute commissions per sales rep
  • How to extract and group data by month
  • How to generate a CSV report for stakeholders



Dataset Overview

We will use two datasets:

1?? Toyota Sales Data (Toyota_sales_data.csv)

  • Contains sales transactions, including sale_amount, commission_percentage, and sale_date.

2?? Sales Reps Data (sales_reps_data.csv)

  • Contains sales representative details, including rep_id, first_name, last_name, and region.

Goal:

  • Calculate total commissions earned per sales rep per month
  • Generate a report with aggregated results


You can download the datasets from the following GitHub link: GitHub Datasets

Step 1: Load the Data

import pandas as pd

# Load datasets
toyota_sales_data = pd.read_csv("data/car_sales/toyota_sales_data.csv")
sales_reps_data = pd.read_csv("data/car_sales/sales_reps_data.csv")

# Preview data
print(toyota_sales_data.head())
print(sales_reps_data.head())        

Step 2: Merge Sales & Reps Data

We merge Toyota sales data and Sales reps data using rep_id as the key.

Merging DataFrames

merged_data = pd.merge(
    sales_reps_data, 
    toyota_sales_data, 
    left_on="rep_id", 
    right_on="sale_rep_id", 
    how="inner"
)

print(merged_data.head())  # Check merged dataset        

Ensures sales data is linked to the correct sales representatives


Step 3: Handle Missing Values

We noticed that the commission_percentage column contains NaN values.

Replace missing commission percentages with 0

merged_data = merged_data.fillna({"commission_pct": 0})

# Verify missing values are handled
print(merged_data["commission_pct"].isnull().sum())  # Should be 0        

Step 4: Compute Monthly Sales Commission

Extract sale_month from sale_date

Calculate commission_earned using:

?? commission_earned = sale_amount * commission_percentage

merged_data = (
    merged_data
    .assign(commission_earned=lambda df: round(df["sale_amount"] * df["commission_pct"], 2)) \
    .assign(sale_month=lambda df: pd.to_datetime(df["sale_date"]).dt.to_period("M")) \
)

print(merged_data.head())  # Check transformed data        

? Ensures all transactions are assigned to the correct month

? Commission calculations are rounded for accuracy


Step 5: Group & Aggregate Sales Data

We will group data by sale_month and rep_id to calculate:

? Total sales per rep

? Total commissions earned

monthly_summary = (
    merged_data
    .groupby(["sale_month", "rep_id", "first_name", "last_name", "email"]) \
    .agg(
        total_sales=("sale_amount", "sum"),
        total_commission=("commission_earned", "sum")
    ) \
    .reset_index()
)

print(monthly_summary.head())  # Check grouped summary        

? Efficiently computes performance metrics for each sales rep per month


Step 6: Export Final Report

? Save the final summary as a CSV file for stakeholders.

# Define file path
file_path = "data/car_sales/monthly_sales_commission.csv"

# Export to CSV
monthly_summary.to_csv(file_path, index=False)

print(f"Report saved successfully at: {file_path}")        

? Removes the default index to keep the report clean

? Ready for further analysis in Excel, BI tools, or databases


Final Chained Workflow - Clean & Efficient

?? Complete workflow in one streamlined code block:

# Load Data
toyota_sales_data = pd.read_csv("data/car_sales/Toyota_sales_data.csv")
sales_reps_data = pd.read_csv("data/car_sales/sales_reps_data.csv")

# Process Data
monthly_summary = (
    pd.merge(sales_reps_data, toyota_sales_data, left_on="rep_id", right_on="sale_rep_id", how="inner")
    .fillna({"commission_pct": 0}) \
    .assign(commission_earned=lambda df: round(df["sale_amount"] * df["commission_pct"], 2)) \
    .assign(sale_month=lambda df: pd.to_datetime(df["sale_date"]).dt.to_period("M")) \
    .groupby(["sale_month", "rep_id", "first_name", "last_name", "email"]) \
    .agg(
        total_sales=("sale_amount", "sum"),
        total_commission=("commission_earned", "sum")
    ) \
    .reset_index()
)

# Export Report
monthly_summary.to_csv("data/car_sales/monthly_sales_commission.csv", index=False)
print("Monthly Sales Commission Report Saved Successfully!")        

? Handles missing data

? Extracts monthly data

? Computes total commissions

? Exports a ready-to-use report


Best Practices

  • Handle Missing Data early to avoid incorrect calculations.
  • Use .assign() for adding new columns dynamically.
  • Use .query() and .groupby() to filter and aggregate efficiently.
  • Always review exported reports for accuracy.


?? Practice Assignment

?? Want to practice? Attempt the Monthly Sales Commission Calculation Using Pandas Assignment ?? Click here.


What’s Next?

In the next module, we will dive into Introduction to Fundamentals of Statistics for Data Analysis. This module will provide a strong foundation in statistical concepts that are essential for analyzing and interpreting data effectively.


Click ?? to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.

Conclusion

In this module, we explored:

  • How to merge and clean multiple datasets.
  • How to compute sales commissions per month.
  • How to group, aggregate, and analyze sales data.
  • How to export structured reports for business use.


Conclusion: Data Transformation Using Pandas

Congratulations on completing the module on data transformation and processing using Pandas. You’ve taken a significant step toward mastering data preparation techniques, which are essential for effective data analysis.

Key Takeaways:

Throughout this module, we explored various techniques to modify, enrich, and prepare data efficiently:

  • Grouping and Aggregations: Utilized the groupby method for segmenting data, computed totals, averages, and counts, and applied multiple aggregation functions using the agg method.
  • Adding and Updating Columns: Created new columns for derived metrics like commission amount or profit margin and updated existing columns to reflect calculated or normalized values.
  • Merging and Joining DataFrames: Combined datasets using relational joins such as inner, left, right, and outer joins with pd.merge. Also, concatenated DataFrames row-wise and column-wise to expand datasets.
  • Applying Functions: Applied custom transformations using apply and map for row and column operations. Used lambda functions and custom logic for grouped transformations.
  • Chaining Multiple Transformations: Streamlined complex operations by chaining multiple Pandas methods, ensuring better readability, efficiency, and performance.

Keep in Mind:

Data transformation is a critical step in preparing datasets for meaningful analysis. These techniques are widely used in:

  • Sales Analytics — Summarizing revenue by region or sales representatives.
  • Customer Segmentation — Grouping customers based on spending patterns.
  • Financial Reporting — Generating aggregated metrics like averages, growth rates, or profit margins.
  • Feature Engineering — Transforming and enriching datasets for machine learning workflows.

Data transformation bridges raw datasets and actionable insights. By mastering these techniques, you are well-equipped for advanced analytics, visualization, and machine learning tasks.


?? Engage With Us!

? Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa

?? Share this article with anyone interested in data engineering, Python, or data analysis. ?? Have questions or need help? Comment below! Let's discuss.

?? Follow us for more hands-on data science tutorials!


要查看或添加评论,请登录

ITVersity, Inc.的更多文章

社区洞察

其他会员也浏览了