Data Merging in Pandas: Left & Right Joins with Real-World Use Cases
How to Efficiently Merge Datasets and Handle Missing Data in Pandas
Merging datasets is a crucial step in data analysis and data engineering. Often, we need to combine information from multiple tables while ensuring that no critical data is lost. In this article, we’ll dive into Left Joins and Right Joins in Pandas, explaining how they work, their key differences, and when to use them.
By the end of this guide, you’ll be able to merge datasets efficiently, handle missing records, and apply real-world data quality checks using Pandas' merge function.
Why This Module is Important
Merging datasets correctly ensures data consistency and accuracy for reporting, visualization, and machine learning. Here’s why Left and Right Joins are essential:
? Including All Parent or Child Records – Left Joins retain all records from the left dataset, while Right Joins retain all records from the right dataset.
? Identifying Missing or Orphaned Data – Helps in finding unmatched records and improving data quality.
? Handling Missing Data Efficiently – Understanding how to manage NaN values ensures complete and reliable datasets.
? Generating Meaningful Reports – Enables analysis of inactive sales reps, orphaned sales transactions, and other key business insights.
In this module, we’ll introduce various join techniques, giving you a solid foundation for working with relational datasets.
Understanding Left and Right Joins in Pandas
Left Join (how='left')
Right Join (how='right')
You can download the datasets from the following GitHub link: GitHub Datasets
Step-by-Step Implementation in Pandas
First, let’s create two sample DataFrames to demonstrate Left and Right Joins:
import pandas as pd
# Sales Representatives Table (Parent)
sales_reps = pd.DataFrame({
"rep_id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"region": ["North", "South", "West"]
})
# Sales Data Table (Child)
sales = pd.DataFrame({
"sale_id": [101, 102, 103, 104],
"sale_rep_id": [1, 2, None, 4], # Includes a missing and an unmatched ID
"sale_amount": [500, 1000, 750, 1200]
})
Performing a Left Join
import pandas as pd
# Performing a LEFT JOIN
left_join = pd.merge(
sales_reps,
sales,
left_on="rep_id",
right_on="sale_rep_id",
how="left"
)
Expected Output:
?? Observations:
? All sales reps are included, even if they don’t have sales.
? Sales Rep ID 3 appears, but the sale_id and sale_amount fields are NaN.
Performing a Right Join
import pandas as pd
# Performing a RIGHT JOIN
right_join = pd.merge(
sales_reps,
sales,
left_on="rep_id",
right_on="sale_rep_id",
how="right"
)
Expected Output:
?? Observations:
? All sales transactions are included, even if there’s no corresponding sales rep.
? Sales Rep ID 4 and NaN appear, but the name and region fields are NaN.
Practical Use Cases
1?? Identifying Sales Reps Without Sales (Inactive Reps)
To find sales reps who haven’t made any sales, we can filter for NaN values in sale_id:
inactive_reps = left_join[left_join["sale_id"].isnull()]
print(inactive_reps)
领英推荐
Expected Output:
2?? Detecting Orphaned Sales Transactions
To find sales transactions that don’t have an assigned sales rep, we can filter for NaN values in rep_id:
orphaned_sales = right_join[right_join["rep_id"].isnull()]
print(orphaned_sales)
Expected Output:
Best Practices for Left and Right Joins
? Choose the Right Join Type:
? Inspect NaN Values:
? Handle Missing Data:
right_join.fillna({"name": "Unknown", "region": "Unknown"}, inplace=True)
Expected Output:
Practice Assignment
?? Want to practice? Attempt the Left and Right Joins in Pandas Assignment ?? Click Here.
?? Need help? Leave a comment, and we’ll assist you!
What’s Next?
Now that we have explored Left and Right Joins, we will move on to Full Outer Joins in Pandas.
Next Up: Outer Joins – Merging Complete Datasets While Retaining All Data
We’ll see how to combine datasets while keeping all records from both sources, filling in missing values where necessary.
Click ?? to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.
Conclusion
In this guide, we covered:
? How Left and Right Joins work and when to use them.
? The difference between retaining unmatched rows from left or right datasets.
? How to handle missing values (NaN) effectively.
? Real-world use cases such as finding inactive reps and orphaned sales.
By mastering these techniques, you’ll be able to confidently merge and integrate data for data analysis, reporting, and machine learning workflows.
?? Engage With Us!
? Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa
?? Share this article with anyone interested in data engineering, Python, or data analysis. ?? Have questions or need help? Comment below! Let's discuss.
?? Follow us for more hands-on data science tutorials!