登录查看更多内容

Daily Pandas Operation

Priyanka Sain

Data Engineer at Intel, Supply Chain | Power BI Instructor

发布日期: 2023年9月21日

A Guide to Data Manipulation with Pandas: Reshaping and Combining Data

Pandas is a powerful Python library for data manipulation and analysis. In this guide, we'll explore how to perform common data operations using Pandas, including reshaping and combining data.

Creating a Blank DataFrame

To start, let's create a blank (empty) DataFrame. This can be done using the pd.DataFrame() constructor without providing any data.

blank_df = pd.DataFrame()

This blank_df will be an empty DataFrame with no rows or columns. You can then add data or define columns as needed.

Left Join with Different Column Names

Suppose you have two DataFrames and you want to perform a left join on columns with different names. You can use the merge function in Pandas to achieve this. Here's how:

merged_df = df1.merge(df2, left_on='ID1', right_on='ID2', how='left')

In this example, we're joining df1 and df2 on the 'ID1' column from df1 and the 'ID2' column from df2, performing a left join.

Dropping Rows with Null Values

To drop rows from a DataFrame where values in a specific column are null (NaN), you can use the dropna() method. For instance:

df = df.dropna(subset=['col1'])

After this operation, df will contain only the rows where 'col1' is not null.

领英推荐

Aggregation in Pandas DataFrame

Rany ElHousieny, PhD??? 1 年前

Mastering Data Visualization with Matplotlib: A…

Suraj Kumar Soni 2 年前

Understanding Pandas DataFrames: A Complete Guide with…

ITVersity, Inc. 1 个月前

Conditional Column Value Assignment

You can conditionally set values in one column based on values in another column.

def set_col2(row):

if 'fix' in row['col1']:

return 'fixed_value'

else:

return 'variable_value'

df['col2'] = df.apply(set_col2, axis=1)

Reshaping Data with Pivot Tables

To reshape data, especially when dealing with multiple columns and values, you can use pivot tables. Suppose you have a DataFrame with columns 'name', 'cost', and 'costtype', and you want to create a new DataFrame with 'costtype' values as separate columns and values as the total cost for different 'name' values. You can achieve this using the pivot_table function:

pivot_df = df.pivot_table(index='name', columns='costtype', values='cost', aggfunc='sum', fill_value=0)

Here, we've reshaped the data, creating a new DataFrame pivot_df with separate columns for each unique 'costtype' value and values as the total cost for different 'name' values.

These operations demonstrate some of the powerful data manipulation capabilities of Pandas, making it a valuable tool for data analysis and preparation tasks.

This article has provided an overview of essential Pandas operations for creating, reshaping, and combining data. With Pandas, you have the flexibility to handle a wide range of data manipulation tasks efficiently.

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

要查看或添加评论，请登录

Priyanka Sain的更多文章

Demand Management and Demand Forecast: A Data Engineer’s Perspective

2025年3月8日

Demand Management and Demand Forecast: A Data Engineer’s Perspective

As a Data Engineer working in the supply chain domain, you often deal with vast amounts of data related to inventory…
Python Yield Generators

2025年1月5日

Python Yield Generators

In Python, writing efficient and memory-friendly code is essential, especially when working with large datasets or…
Leveraging Digital Twins for Air Cargo Supply Chain Optimization

2024年12月22日

Leveraging Digital Twins for Air Cargo Supply Chain Optimization

The air cargo industry, pivotal for transporting high-value and urgent shipments, constitutes less than 5% of global…
Digital Twins: Revolutionizing Manufacturing

2024年12月15日

Digital Twins: Revolutionizing Manufacturing

What Are Digital Twins in Manufacturing? A Digital Twin is a virtual representation of a process, tool, or even a full…
AI in Supply Chain Risk Management: Transforming Challenges into Opportunities

2024年12月14日

AI in Supply Chain Risk Management: Transforming Challenges into Opportunities

Supply chains today face unprecedented complexity and risks. From natural disasters and geopolitical uncertainties to…

2 条评论
Power BI Cloud Org Apps: A New Era in Workspace Content Distribution

2024年12月8日

Power BI Cloud Org Apps: A New Era in Workspace Content Distribution

The latest preview feature from Microsoft Power BI, Org Apps, brings a revolutionary approach to distributing content…
Unlocking Performance in Snowflake: The Role of Metadata Service

2024年11月23日

Unlocking Performance in Snowflake: The Role of Metadata Service

Snowflake is widely known for its scalability and performance as a cloud data platform. At the heart of Snowflake’s…
Understanding Git Submodules

2024年11月19日

Understanding Git Submodules

Git submodules are an essential feature of Git that allow you to include one Git repository as a subdirectory in…
Understanding Outliers in Supply Chain Data

2024年11月10日

Understanding Outliers in Supply Chain Data

In supply chain analytics, data-driven insights drive optimization and efficiency. However, outliers—data points that…
Scaling Data for Optimized Supply Chain Performance: A Comprehensive Guide

2024年11月10日

Scaling Data for Optimized Supply Chain Performance: A Comprehensive Guide

In supply chains, scaling data is key to managing large and complex datasets from inventory, suppliers, and sales…

1 条评论

See all articles

Daily Pandas Operation

Priyanka Sain

Data Engineer at Intel, Supply Chain | Power BI Instructor

A Guide to Data Manipulation with Pandas: Reshaping and Combining Data

Creating a Blank DataFrame

Left Join with Different Column Names

Dropping Rows with Null Values

领英推荐

Conditional Column Value Assignment

Reshaping Data with Pivot Tables

Priyanka Sain的更多文章

社区洞察

其他会员也浏览了

Seaborn

Handling Duplicates using Pandas DataFrames

Accessing Data with loc: Label-Based Indexing in Pandas

?? Day 11: Navigating the Depths of Data Structures and Algorithms for Data Science!

Integrating PyCaret time-series module into Power BI

Accessing Data with iloc: Position-Based Indexing in Pandas

SQL Challenge: Weekly Orders Report

Pandas - GroupBy Practice

Boost Your Data Analysis with These 30 Essential Pandas Tricks!

Pandas - Sort DataFrame

A Guide to Data Manipulation with Pandas: Reshaping and Combining Data

Creating a Blank DataFrame

Left Join with Different Column Names

Dropping Rows with Null Values

领英推荐

Conditional Column Value Assignment

Reshaping Data with Pivot Tables

Priyanka Sain的更多文章

Demand Management and Demand Forecast: A Data Engineer’s Perspective

Python Yield Generators

Leveraging Digital Twins for Air Cargo Supply Chain Optimization

Digital Twins: Revolutionizing Manufacturing

AI in Supply Chain Risk Management: Transforming Challenges into Opportunities

Power BI Cloud Org Apps: A New Era in Workspace Content Distribution

Unlocking Performance in Snowflake: The Role of Metadata Service

Understanding Git Submodules

Understanding Outliers in Supply Chain Data

Scaling Data for Optimized Supply Chain Performance: A Comprehensive Guide

社区洞察

其他会员也浏览了

Seaborn

Handling Duplicates using Pandas DataFrames

Accessing Data with loc: Label-Based Indexing in Pandas

?? Day 11: Navigating the Depths of Data Structures and Algorithms for Data Science!

Integrating PyCaret time-series module into Power BI

Accessing Data with iloc: Position-Based Indexing in Pandas

SQL Challenge: Weekly Orders Report

Pandas - GroupBy Practice

Boost Your Data Analysis with These 30 Essential Pandas Tricks!

Pandas - Sort DataFrame