ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Efficient Data Processing with Pandas: Chaining Transformations

ITVersity, Inc.

making IT resourceful (???????? ?????????????????????? ????????)

å‘å¸ƒæ—¥æœŸ: 2025å¹´2æœˆ22æ—¥

In this article, we'll explore chaining transformations in Pandas, a powerful technique for cleaning, transforming, and analyzing data efficiently. By chaining multiple operations, you can improve:

Efficiency â€“ Avoid unnecessary intermediate variables.
Readability â€“ Write concise and intuitive code.
Scalability â€“ Easily extend workflows for complex tasks.

Weâ€™ll demonstrate real-world examples using the Toyota Sales Dataset.

Why Use Chaining for Data Transformations?

Instead of writing multiple steps separately, chaining allows us to combine multiple operations into a single workflow.

Benefits:

Reduces temporary variables, improving memory efficiency.
Enhances readability, making code more intuitive.
Facilitates debugging, since each transformation builds on the previous one.

Letâ€™s see how it works in action!

You can download the datasets from the following GitHub link: GitHub Datasets

Step 1: Load the Dataset

import pandas as pd

# Load dataset
toyota_sales_data = pd.read_csv("data/car_sales/Toyota_sales_data.csv")

# Preview dataset
print(toyota_sales_data.head())

Step 2: Traditional Approach vs. Chained Approach

Let's filter completed sales and calculate commission.

Traditional (Step-by-Step) Approach

# Filter completed sales
completed_sales = toyota_sales_data[toyota_sales_data["sale_status"] == "Completed"]

# Compute commission earned
completed_sales["commission_earned"] = completed_sales["sale_amount"] * completed_sales["commission_pct"]

# Display results
print(completed_sales.head())

? Warning: This generates a Pandas SettingWithCopyWarning, indicating potential issues with modifying a slice of the DataFrame.

Step 3: Chaining Transformations

Using Pandas chaining, we can perform the same operations in one seamless workflow.

Chained Approach

sales_with_commission = (
    toyota_sales_data
    .query("sale_status == 'Completed'")  # Filter completed sales
    .assign(commission_earned=lambda x: x["sale_amount"] * x["commission_pct"])  # Compute commission
)

print(sales_with_commission.head())

Step 4: Handling Missing Values

What if commission_percentage has missing values (NaN)? We can fill missing values with 0 while chaining.

sales_with_commission = (
    toyota_sales_data
    .query("sale_status == 'Completed'")
    .assign(
        commission_pct=lambda x: x["commission_pct"].fillna(0),  # Replace NaN with 0
        commission_earned=lambda x: x["sale_amount"] * x["commission_pct"]  # Compute commission
    )
)

print(sales_with_commission.head())

? Handling missing values ensures data integrity.

Step 5: Using round() for Formatting

Letâ€™s round commission values to two decimal places.

sales_with_commission = (
    toyota_sales_data
    .query("sale_status == 'Completed'")
    .assign(
        commission_pct=lambda x: x["commission_pct"].fillna(0),
        commission_earned=lambda x: round(x["sale_amount"] * x["commission_pct"], 2)  # Round to 2 decimals
    )
)

print(sales_with_commission.head())

Step 6: Merging Multiple DataFrames with Chaining

Letâ€™s merge Sales Data with Sales Rep Data, then filter & compute commission in a single workflow.

é¢†è‹±æŽ¨è

Why Data Teams Need More Than Just Good Code

Andrew Madson MSc, MBA 4 ä¸ªæœˆå‰

Hex - The Best Product for the Technical Analyst

Tomasz Tunguz 3 å¹´å‰

Zyrix DataZen! ZDZ for Data Analysis in 2025 - Analytics Insight:

Zyrix DataZen! ZDZ for Data Analysis in 2025 -â€¦

Analytics Insight? 2 ä¸ªæœˆå‰

sales_reps_data = pd.read_csv("data/car_sales/sales_reps_data.csv")

# Chained transformations: Merge, Filter, Compute Commission
sales_with_commission = (
    pd.merge(sales_reps_data, toyota_sales_data, left_on="rep_id", right_on="sale_rep_id", how="inner")
    .query("sale_status == 'Completed'")
    .assign(
        commission_earned=lambda x: round(x["sale_amount"] * x["commission_pct"].fillna(0), 2)
    )
)

print(sales_with_commission.head())

Step 7: Debugging Chained Operations

Debugging a chained workflow can be done step-by-step:

# Step 1: Merge Data
merged_data = pd.merge(sales_reps_data, toyota_sales_data, left_on="rep_id", right_on="sale_rep_id", how="inner")

# Step 2: Filter Completed Sales
filtered_data = merged_data.query("sale_status == 'Completed'")

# Step 3: Compute Commission
final_data = filtered_data.assign(
    commission_earned=lambda x: round(x["sale_amount"] * x["commission_pct"].fillna(0), 2)
)

# Preview Data
print(final_data.head())

? Break transformations into steps before chaining them together.

Best Practices for Chaining

Ensure Readability

Use line breaks and parentheses for clarity.
Use comments to explain transformations.

Use .assign() for New Columns

Avoid modifying original DataFrame slices.
.assign() prevents SettingWithCopyWarning.

Use .query() for Filtering

More readable than df[df["column"] == "value"].

Use .fillna() for Handling Missing Data

Prevents NaN values from breaking calculations.

Break Down Complex Steps for Debugging

Test transformations individually before chaining.

Practice Assignment

?? Want to practice? Attempt the Chaining Transformations in Pandas Assignment

?? Click here.

?? Need help? Leave a comment, and weâ€™ll assist you!

Whatâ€™s Next?

In the next module, weâ€™ll explore advanced chaining techniques that allow you to write cleaner, more efficient, and readable Pandas code by linking multiple operations together seamlessly.

Click ?? to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.

Conclusion

In this module, we explored:

How to use Pandas chaining to streamline data transformations.
How to merge, filter, and compute new columns efficiently.
How to debug complex workflows before chaining.
Best practices for readability and performance.

?? Engage With Us!

? Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa

?? Share this article with anyone interested in data engineering, Python, or data analysis. ?? Have questions or need help? Comment below! Let's discuss.

?? Follow us for more hands-on data science tutorials!

Efficient Data Processing with Pandas: Chaining Transformations

ITVersity, Inc.

making IT resourceful (???????? ?????????????????????? ????????)

Why Use Chaining for Data Transformations?

Step 1: Load the Dataset

Step 2: Traditional Approach vs. Chained Approach

Traditional (Step-by-Step) Approach

Step 3: Chaining Transformations

Chained Approach

Step 4: Handling Missing Values

Step 5: Using round() for Formatting

Step 6: Merging Multiple DataFrames with Chaining

é¢†è‹±æŽ¨è

Step 7: Debugging Chained Operations

Best Practices for Chaining

Ensure Readability

Use .assign() for New Columns

Use .query() for Filtering

Use .fillna() for Handling Missing Data

Break Down Complex Steps for Debugging

Practice Assignment

Whatâ€™s Next?

Conclusion

?? Engage With Us!

AI, Data and Cloud Updates

2,456 ä½å…³æ³¨è€…

ITVersity, Inc.çš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

65 Best Resources to Learn Data Analysis

Datatile: A Library for AutoEDA

Training Program Topics - List 3

From Raw Data to Actionable Insights: The Role of Preprocessing and Cleaning

Handling imbalanced data with SMOTE

Real-World Data Merging: Inner Joins & Aggregations in Pandas

CLASSIFICATION OF DATA STRUCTURE

?? Ridge vs. Lasso: Tuning Models for Stock Markets ??

End-To-End Data Processing

Langchain Challenges and How to Solve Them

Why Use Chaining for Data Transformations?

Step 1: Load the Dataset

Step 2: Traditional Approach vs. Chained Approach

Traditional (Step-by-Step) Approach

Step 3: Chaining Transformations

Chained Approach

Step 4: Handling Missing Values

Step 5: Using round() for Formatting

Step 6: Merging Multiple DataFrames with Chaining

é¢†è‹±æŽ¨è

Step 7: Debugging Chained Operations

Best Practices for Chaining

Ensure Readability

Use .assign() for New Columns

Use .query() for Filtering

Use .fillna() for Handling Missing Data

Break Down Complex Steps for Debugging

Practice Assignment

Whatâ€™s Next?

Conclusion

?? Engage With Us!

AI, Data and Cloud Updates

2,456 ä½å…³æ³¨è€…

ITVersity, Inc.çš„æ›´å¤šæ–‡ç«

Top Resources & Job Updates - Edition - Mar 21, 2025

The Power of Generative AI: What It Is, Why You Should Learn It, and How Itâ€™s Changing the World

Descriptive vs Inferential Statistics in Pandas: How to Analyze and Interpret Data Effectively

Introduction to Fundamentals of Statistics for Data Analysis

Monthly Sales Commission Analysis with Pandas - A Complete Workflow

Mastering Advanced Chaining Techniques in Pandas

Adding and Updating Columns in Pandas: A Step-by-Step Guide

Mastering Row-Level Transformations in Pandas with apply()

Advanced Custom Aggregation Functions in Pandas

How to Create Custom Aggregation Functions in Pandas

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

65 Best Resources to Learn Data Analysis

Datatile: A Library for AutoEDA

Training Program Topics - List 3

From Raw Data to Actionable Insights: The Role of Preprocessing and Cleaning

Handling imbalanced data with SMOTE

Real-World Data Merging: Inner Joins & Aggregations in Pandas

CLASSIFICATION OF DATA STRUCTURE

?? Ridge vs. Lasso: Tuning Models for Stock Markets ??

End-To-End Data Processing

Langchain Challenges and How to Solve Them

é¢†è‹±æŽ¨è

2,456 ä½å…³æ³¨è€…

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†