Daily Pandas Operation
A Guide to Data Manipulation with Pandas: Reshaping and Combining Data
Pandas is a powerful Python library for data manipulation
Creating a Blank DataFrame
To start, let's create a blank (empty) DataFrame. This can be done using the pd.DataFrame() constructor without providing any data.
blank_df = pd.DataFrame()
This blank_df will be an empty DataFrame with no rows or columns. You can then add data or define columns as needed.
Left Join with Different Column Names
Suppose you have two DataFrames and you want to perform a left join on columns with different names. You can use the merge function in Pandas to achieve this. Here's how:
merged_df = df1.merge(df2, left_on='ID1', right_on='ID2', how='left')
In this example, we're joining df1 and df2 on the 'ID1' column from df1 and the 'ID2' column from df2, performing a left join
To drop rows from a DataFrame where values in a specific column are null (NaN), you can use the dropna() method. For instance:
df = df.dropna(subset=['col1'])
After this operation, df will contain only the rows where 'col1' is not null.
领英推荐
You can conditionally set values in one column based on values in another column.
def set_col2(row):
if 'fix' in row['col1']:
return 'fixed_value'
else:
return 'variable_value'
df['col2'] = df.apply(set_col2, axis=1)
To reshape data, especially when dealing with multiple columns and values, you can use pivot tables. Suppose you have a DataFrame with columns 'name', 'cost', and 'costtype', and you want to create a new DataFrame with 'costtype' values as separate columns and values as the total cost for different 'name' values. You can achieve this using the pivot_table function:
pivot_df = df.pivot_table(index='name', columns='costtype', values='cost', aggfunc='sum', fill_value=0)
Here, we've reshaped the data, creating a new DataFrame pivot_df with separate columns for each unique 'costtype' value and values as the total cost for different 'name' values.
These operations demonstrate some of the powerful data manipulation capabilities of Pandas
This article has provided an overview of essential Pandas operations for creating, reshaping, and combining data. With Pandas, you have the flexibility to handle a wide range of data manipulation tasks efficiently.