登录查看更多内容

Mastering Pandas: Key Functions You Need to Know

Karan Parmar

AI/ML Engineer | Python, Backend Development, LLMs, and Cloud Platforms | Proven Record in Building Scalable AI Solutions with Django, Flask, TensorFlow, and OpenAI | Passionate about Data, Analytics, and Innovation

发布日期: 2024年6月11日

Pandas is an incredibly powerful library that provides data structures like DataFrames and Series, which make data cleaning, transformation, and analysis a breeze. Its importance in data science and analytics is immense, as it allows for efficient handling of large datasets, complex data operations, and integration with other libraries.

In this article, we'll explore some of the key functions in Pandas that are vital for anyone working with data. We'll cover functions for data loading, manipulation, aggregation, and visualization, providing practical examples to illustrate their use.

Pandas is a powerful library for data manipulation and analysis in Python, providing data structures like Series and DataFrame. Here are some important functions and methods in pandas, along with examples:

Data Structures:

1. pandas.Series: One-dimensional labeled array capable of holding any data type.

   import pandas as pd
   series = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])

2. pandas.DataFrame: Two-dimensional, size-mutable, potentially heterogeneous tabular data.

   data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
           'Age': [28, 24, 35, 32],
           'City': ['New York', 'Paris', 'Berlin', 'London']}

   df = pd.DataFrame(data)

Data Inspection:

1. DataFrame.head(): Return the first n rows of the DataFrame.

   df.head(3)

2. DataFrame.tail(): Return the last n rows of the DataFrame.

   df.tail(2)

3. DataFrame.info(): Print a concise summary of the DataFrame.

   df.info()

4. DataFrame.describe(): Generate descriptive statistics.

   df.describe()

Data Selection:

1. DataFrame.loc: Access a group of rows and columns by labels.

   df.loc[1, 'Name']

2. DataFrame.iloc: Access a group of rows and columns by integer position.

   df.iloc[2, 0]

3. DataFrame.at: Access a single value for a row/column label pair.

   df.at[0, 'Name']

4. DataFrame.iat: Access a single value for a row/column pair by integer position.

   df.iat[3, 2]

Data Manipulation:

1. DataFrame.drop(): Drop specified labels from rows or columns.

   df.drop(['Age'], axis=1, inplace=True)

领英推荐

A Deep Dive into SQL Recursive Queries

Benjamin Bennett Alexander 3 周前

Data Analysis with Pandas: DataFrame Merging Methods…

Benjamin Bennett Alexander 8 个月前

Data Analysis Power with Pandas DataFrames

Hossein Safari 1 年前

2. DataFrame.rename(): Alter axes labels.

   df.rename(columns={'Name': 'FirstName'}, inplace=True)

3. DataFrame.sort_values(): Sort by the values along either axis.

   df.sort_values(by='Age', ascending=False, inplace=True)

4. DataFrame.fillna(): Fill NA/NaN values using the specified method.

   df.fillna(0, inplace=True)

Data Aggregation:

1. `DataFrame.groupby()`: Group DataFrame using a mapper or by a Series of columns.

   df.groupby('City').mean()

2. DataFrame.agg(): Aggregate using one or more operations over the specified axis.

   df.agg({'Age': ['min', 'max', 'mean']})

3. DataFrame.apply(): Apply a function along an axis of the DataFrame.

   df['Age'] = df['Age'].apply(lambda x: x + 1)

Merging and Joining:

1. pd.merge(): Merge DataFrame or named Series objects with a database-style join.

   df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 4]})
   df2 = pd.DataFrame({'key': ['B', 'D', 'E'], 'value': [5, 6, 7]})
   merged_df = pd.merge(df1, df2, on='key', how='inner')

2. DataFrame.join(): Join columns of another DataFrame.

   df3 = df1.set_index('key').join(df2.set_index('key'), lsuffix='_left', rsuffix='_right')

Input and output

1. pd.read_csv(): Read a comma-separated values (csv) file into DataFrame.

   df = pd.read_csv('file.csv')

2. DataFrame.to_csv(): Write DataFrame to a comma-separated values (csv) file.

   df.to_csv('output.csv', index=False)

These are some of the fundamental functions and methods in pandas. Pandas also provides advanced functionalities for time series analysis, data visualization, and more, making it an essential tool for data scientists and analysts.

要查看或添加评论，请登录

Karan Parmar的更多文章

Understanding Statistics: Central Tendency, Data Types, and Dispersion

2024年6月13日

Understanding Statistics: Central Tendency, Data Types, and Dispersion

1. Measures of Central Tendency Let’s start with the basics: Measures of Central Tendency.
Important Functions in Numpy Library

2024年6月10日

Important Functions in Numpy Library

As we all know how important Numpy is in Python, so lets see some of the most used and most important function in Numpy…

Data Structures:

1. pandas.Series: One-dimensional labeled array capable of holding any data type.

2. pandas.DataFrame: Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data Inspection:

1. DataFrame.head(): Return the first n rows of the DataFrame.

2. DataFrame.tail(): Return the last n rows of the DataFrame.

3. DataFrame.info(): Print a concise summary of the DataFrame.

4. DataFrame.describe(): Generate descriptive statistics.

Data Selection:

1. DataFrame.loc: Access a group of rows and columns by labels.

2. DataFrame.iloc: Access a group of rows and columns by integer position.

3. DataFrame.at: Access a single value for a row/column label pair.

4. DataFrame.iat: Access a single value for a row/column pair by integer position.

Data Manipulation:

1. DataFrame.drop(): Drop specified labels from rows or columns.

领英推荐

2. DataFrame.rename(): Alter axes labels.

3. DataFrame.sort_values(): Sort by the values along either axis.

4. DataFrame.fillna(): Fill NA/NaN values using the specified method.

Data Aggregation:

1. `DataFrame.groupby()`: Group DataFrame using a mapper or by a Series of columns.

2. DataFrame.agg(): Aggregate using one or more operations over the specified axis.

3. DataFrame.apply(): Apply a function along an axis of the DataFrame.

Merging and Joining:

1. pd.merge(): Merge DataFrame or named Series objects with a database-style join.

2. DataFrame.join(): Join columns of another DataFrame.

Input and output

1. pd.read_csv(): Read a comma-separated values (csv) file into DataFrame.

2. DataFrame.to_csv(): Write DataFrame to a comma-separated values (csv) file.

Karan Parmar的更多文章

Understanding Statistics: Central Tendency, Data Types, and Dispersion

Important Functions in Numpy Library

社区洞察

其他会员也浏览了

A Beginner's Guide to Pandas for Powerful Data Analysis

Mastering Pandas for Data Engineers: A 60-Day Data Processing Journey

Recap of Custom DataFrames and Advanced Concepts

Unlocking Pandas: Listing Column Names and a Solid Foundation for Data Analysis

Mastering Row-Level Transformations in Pandas with apply()

Top 10 Tools for Data Analysis in 2024

Mastering Data Wrangling with Pandas: A Step-by-Step Guide

Are You Building on Sand? ??

Creating Your Next Big Thing? Don't Skip the Prototype Phase with SAS Viya

Data Manipulation Tools (Pandas, SQL) in Data Science