Mastering Row-Level Transformations in Pandas with apply()
Row-level transformations are crucial in data processing and feature engineering, allowing us to modify datasets dynamically. In this article, you’ll learn how to use Pandas' apply() method for row-wise operations, including:
Let’s dive in!
Why Use apply() for Row-Level Transformations?
The apply() method allows applying a function to each row or column of a Pandas DataFrame. It’s useful when:
You can download the datasets from the following GitHub link: GitHub Datasets
Step 1: Load the Dataset
Let’s load the dataset and preview the first 10 records.
import pandas as pd
# Load the dataset
toyota_sales_data = pd.read_csv("data/car_sales/toyota_sales_data.csv")
# Preview the dataset
print(toyota_sales_data.head())
This dataset contains sale amount and commission percentage, which we will use for transformations.
Step 2: Understanding the apply() Method
Before using apply(), let’s check its documentation:
help(pd.DataFrame.apply)
Key parameters of apply():
Since we are performing row-wise transformations, we will set axis=1.
Step 3: Creating a Derived Column - Commission Amount
Problem Statement:
We need to calculate the Commission Amount for each sale using:
Method 1: Using a Custom Function
# Define a function to calculate commission amount
def calculate_commission(sale):
return sale["sale_amount"] * sale["commission_percentage"]
# Apply function to calculate commission amount for each row
toyota_sales_data["commission_amount"] = toyota_sales_data.apply(calculate_commission, axis=1)
# Preview the data
print(toyota_sales_data.head())
Step 4: Handling Missing Values
Some commission_percentage values are missing (NaN). If not handled, NaN values will propagate.
Solution: Assign a Default Commission Rate
If commission_percentage is missing, we assume a default value of 0%.
# Enhanced function to handle NaN values
def calculate_commission_safe(sale):
commission_pct = sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0
return sale["sale_amount"] * commission_pct
# Apply function
toyota_sales_data["commission_amount"] = toyota_sales_data.apply(calculate_commission_safe, axis=1)
# Preview data
print(toyota_sales_data.head())
Step 5: Using a Lambda Function
We can simplify the function using a Lambda expression.
领英推荐
toyota_sales_data["commission_amount"] = toyota_sales_data.apply(
lambda sale: sale["sale_amount"] * (sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0),
axis=1
)
# Preview data
print(toyota_sales_data.head())
Step 6: Adding a Flag Column for High Commissions
Problem Statement:
We want to identify sales where the commission amount exceeds $1,000.
Solution:
Create a new column high_commission, which stores:
# Define function to flag high commission amounts
def flag_high_commission(sale):
return sale["commission_amount"] > 1000
# Apply function to create a new column
toyota_sales_data["high_commission"] = toyota_sales_data.apply(flag_high_commission, axis=1)
# Preview data
print(toyota_sales_data.head())
Step 7: Removing Unnecessary Columns
Since we no longer need sale_date and sale_status, let's drop them.
# Drop unnecessary columns
toyota_sales_data = toyota_sales_data.drop(columns=["sale_date", "sale_status"])
# Preview data
print(toyota_sales_data.head())
Step 8: Combining Multiple Transformations
We can calculate commission and flag high commissions in one step:
# Define combined transformation
def calculate_commission_and_flag(sale):
commission_pct = sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0
commission_amount = sale["sale_amount"] * commission_pct
high_commission = commission_amount > 1000
return pd.Series([commission_amount, high_commission], index=["commission_amount", "high_commission"])
# Apply function to create two new columns
toyota_sales_data[["commission_amount", "high_commission"]] = toyota_sales_data.apply(calculate_commission_and_flag, axis=1)
# Preview data
print(toyota_sales_data.head())
Best Practices for Using apply()
Practice Assignment
?? Want to practice? Attempt the Implementing Custom Transformations with apply() in Pandas Assignment?? Click Here.
?? Need help? Leave a comment, and we’ll assist you!
What’s Next?
In the next lecture, we will explore Adding and Updating Columns in a Pandas DataFrame. This is a crucial aspect of data manipulation, allowing us to enhance, modify, and restructure our datasets effectively.
Click ?? to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.
Conclusion
In this article, you learned:
Mastering apply() enables powerful transformations in Pandas, making your data processing workflows more efficient!
?? Engage With Us!
? Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa
?? Share this article with anyone interested in data engineering, Python, or data analysis. ?? Have questions or need help? Comment below! Let's discuss.
?? Follow us for more hands-on data science tutorials!