登录查看更多内容

Data Merging in Pandas: Left & Right Joins with Real-World Use Cases

ITVersity, Inc.

making IT resourceful (???????? ?????????????????????? ????????)

发布日期: 2025年2月12日

How to Efficiently Merge Datasets and Handle Missing Data in Pandas

Merging datasets is a crucial step in data analysis and data engineering. Often, we need to combine information from multiple tables while ensuring that no critical data is lost. In this article, we’ll dive into Left Joins and Right Joins in Pandas, explaining how they work, their key differences, and when to use them.

By the end of this guide, you’ll be able to merge datasets efficiently, handle missing records, and apply real-world data quality checks using Pandas' merge function.

Why This Module is Important

Merging datasets correctly ensures data consistency and accuracy for reporting, visualization, and machine learning. Here’s why Left and Right Joins are essential:

? Including All Parent or Child Records – Left Joins retain all records from the left dataset, while Right Joins retain all records from the right dataset.

? Identifying Missing or Orphaned Data – Helps in finding unmatched records and improving data quality.

? Handling Missing Data Efficiently – Understanding how to manage NaN values ensures complete and reliable datasets.

? Generating Meaningful Reports – Enables analysis of inactive sales reps, orphaned sales transactions, and other key business insights.

In this module, we’ll introduce various join techniques, giving you a solid foundation for working with relational datasets.

Understanding Left and Right Joins in Pandas

Left Join (how='left')

Includes all rows from the left table and only the matching rows from the right table.
Unmatched rows from the right table are filled with NaN values.
Example Use Case: Getting a complete list of sales reps, including those who haven’t made any sales.

Right Join (how='right')

Includes all rows from the right table and only the matching rows from the left table.
Unmatched rows from the left table are filled with NaN values.
Example Use Case: Getting a full list of sales transactions, including those without an assigned sales rep.

You can download the datasets from the following GitHub link: GitHub Datasets

Step-by-Step Implementation in Pandas

First, let’s create two sample DataFrames to demonstrate Left and Right Joins:

import pandas as pd

# Sales Representatives Table (Parent)
sales_reps = pd.DataFrame({
    "rep_id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "region": ["North", "South", "West"]
})

# Sales Data Table (Child)
sales = pd.DataFrame({
    "sale_id": [101, 102, 103, 104],
    "sale_rep_id": [1, 2, None, 4],  # Includes a missing and an unmatched ID
    "sale_amount": [500, 1000, 750, 1200]
})

Performing a Left Join

import pandas as pd

# Performing a LEFT JOIN
left_join = pd.merge(
    sales_reps,
    sales,
    left_on="rep_id",
    right_on="sale_rep_id",
    how="left"
)

Expected Output:

?? Observations:

? All sales reps are included, even if they don’t have sales.

? Sales Rep ID 3 appears, but the sale_id and sale_amount fields are NaN.

Performing a Right Join

import pandas as pd

# Performing a RIGHT JOIN
right_join = pd.merge(
    sales_reps,
    sales,
    left_on="rep_id",
    right_on="sale_rep_id",
    how="right"
)

Expected Output:

?? Observations:

? All sales transactions are included, even if there’s no corresponding sales rep.

? Sales Rep ID 4 and NaN appear, but the name and region fields are NaN.

Practical Use Cases

1?? Identifying Sales Reps Without Sales (Inactive Reps)

To find sales reps who haven’t made any sales, we can filter for NaN values in sale_id:

inactive_reps = left_join[left_join["sale_id"].isnull()]  

print(inactive_reps)

领英推荐

Introduction To Data Science: A Comprehensive Guide…

Ze Learning Labb 11 个月前

Data Science vs. Data Analytics – What’s the…

Ze Learning Labb 2 个月前

Exclusive Sneak Peak At What Is Data Science!

Ze Learning Labb 1 年前

Expected Output:

2?? Detecting Orphaned Sales Transactions

To find sales transactions that don’t have an assigned sales rep, we can filter for NaN values in rep_id:

orphaned_sales = right_join[right_join["rep_id"].isnull()]  

print(orphaned_sales)

Expected Output:

Best Practices for Left and Right Joins

? Choose the Right Join Type:

Use Left Join when the parent dataset (e.g., sales reps) is more important.
Use Right Join when the child dataset (e.g., sales transactions) is more important.

? Inspect NaN Values:

Check for missing values using .isnull() to identify unmatched rows.

? Handle Missing Data:

Replace NaN values with "Unknown" where necessary.

right_join.fillna({"name": "Unknown", "region": "Unknown"}, inplace=True)

Expected Output:

Practice Assignment

?? Want to practice? Attempt the Left and Right Joins in Pandas Assignment ?? Click Here.

?? Need help? Leave a comment, and we’ll assist you!

What’s Next?

Now that we have explored Left and Right Joins, we will move on to Full Outer Joins in Pandas.

Next Up: Outer Joins – Merging Complete Datasets While Retaining All Data

We’ll see how to combine datasets while keeping all records from both sources, filling in missing values where necessary.

Click ?? to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.

Conclusion

In this guide, we covered:

? How Left and Right Joins work and when to use them.

? The difference between retaining unmatched rows from left or right datasets.

? How to handle missing values (NaN) effectively.

? Real-world use cases such as finding inactive reps and orphaned sales.

By mastering these techniques, you’ll be able to confidently merge and integrate data for data analysis, reporting, and machine learning workflows.

?? Engage With Us!

? Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa

?? Share this article with anyone interested in data engineering, Python, or data analysis. ?? Have questions or need help? Comment below! Let's discuss.

?? Follow us for more hands-on data science tutorials!

Data Merging in Pandas: Left & Right Joins with Real-World Use Cases

ITVersity, Inc.

making IT resourceful (???????? ?????????????????????? ????????)

How to Efficiently Merge Datasets and Handle Missing Data in Pandas

Why This Module is Important

Understanding Left and Right Joins in Pandas

Left Join (how='left')

Right Join (how='right')

Step-by-Step Implementation in Pandas

Performing a Left Join

Performing a Right Join

Practical Use Cases

1?? Identifying Sales Reps Without Sales (Inactive Reps)

领英推荐

2?? Detecting Orphaned Sales Transactions

Best Practices for Left and Right Joins

? Choose the Right Join Type:

? Inspect NaN Values:

? Handle Missing Data:

Practice Assignment

What’s Next?

Conclusion

?? Engage With Us!

AI, Data and Cloud Updates

2,356 位关注者

ITVersity, Inc.的更多文章

其他会员也浏览了

Data Merging in Pandas: An Introduction to Combining Datasets

Pandas Series: Your First Step to Efficient Data Handling

Applying DataFrame Transformations with Pandas: A Practical Approach

Tools in Data Science

What is the difference between Data Science, Business Analytics and Big Data

Dataiku 12.5: Your Data's Ultimate Wingman

Top 7 Data Science Tools for 2023

Data Science vs Data Analytics — How to decide which one is right for you?

BigQuery Data Canvas: Revolutionizing Data Analytics

Unleashing the Potential of Data Science: An In-Depth Manual on KNIME and its Influence on Analytics and Decision-Making

How to Efficiently Merge Datasets and Handle Missing Data in Pandas

Why This Module is Important

Understanding Left and Right Joins in Pandas

Left Join (how='left')

Right Join (how='right')

Step-by-Step Implementation in Pandas

Performing a Left Join

Performing a Right Join

Practical Use Cases

1?? Identifying Sales Reps Without Sales (Inactive Reps)

领英推荐

2?? Detecting Orphaned Sales Transactions

Best Practices for Left and Right Joins

? Choose the Right Join Type:

? Inspect NaN Values:

? Handle Missing Data:

Practice Assignment

What’s Next?

Conclusion

?? Engage With Us!

AI, Data and Cloud Updates

2,356 位关注者

ITVersity, Inc.的更多文章

The Power of Generative AI: What It Is, Why You Should Learn It, and How It’s Changing the World

Descriptive vs Inferential Statistics in Pandas: How to Analyze and Interpret Data Effectively

Introduction to Fundamentals of Statistics for Data Analysis

Monthly Sales Commission Analysis with Pandas - A Complete Workflow

Mastering Advanced Chaining Techniques in Pandas

Efficient Data Processing with Pandas: Chaining Transformations

Adding and Updating Columns in Pandas: A Step-by-Step Guide

Mastering Row-Level Transformations in Pandas with apply()

Advanced Custom Aggregation Functions in Pandas

How to Create Custom Aggregation Functions in Pandas

其他会员也浏览了

Data Merging in Pandas: An Introduction to Combining Datasets

Pandas Series: Your First Step to Efficient Data Handling

Applying DataFrame Transformations with Pandas: A Practical Approach

Tools in Data Science

What is the difference between Data Science, Business Analytics and Big Data

Dataiku 12.5: Your Data's Ultimate Wingman

Top 7 Data Science Tools for 2023

Data Science vs Data Analytics — How to decide which one is right for you?

BigQuery Data Canvas: Revolutionizing Data Analytics

Unleashing the Potential of Data Science: An In-Depth Manual on KNIME and its Influence on Analytics and Decision-Making