登录查看更多内容

Few Essential Pandas Functions.

Asha Pondicherry

Data Analytics and Data Engineering

发布日期: 2022年7月18日

This summer apart from volunteering in research work with Professor Julio, I also worked on a few Kaggle datasets which included a lot of data crunching. This article discusses a few essential Pandas functions which came in handy to comprehend the given datasets and perform certain manipulations for the analysis.

The following topics and discussed in this article:

An index is simply a label for rows just like the names for every column.

Note that an index is not a part of dataframe i.e; it is not counted as a column.

Eg: The shape of the above dataframe is (4,4)

The main characteristics of an index function are identification and selection:

Identification: An index is a pointer to a data location.?
Selection: An index selects a value w.r.t it row label (index) and the column name.

FYI, use set_index to set a column as an index if required and go back to the original dataframe by using reset_index.

Range-Index: This concept is rarely discussed but is one of those good-to-know topics. It is an immutable index implemented by giving a monotonous index range, especially when the dataset is small and does not require a lot of computing power. Of course, it is the default index type used by dataframes when the range is not explicitly mentioned.?

It is one of those “memory saving” techniques because we can give a monotonous range
It also improves computing speed

Pivot Table: The “pandas.pivot_table” has the same functionality to that of the “pivot table” function in Excel.?

领英推荐

Milan's Data Science Insights #008

Milan Janosov 7 个月前

My Entry to the Data Realm - The HandShake: Part One

Shaurya Uppal 1 年前

Top Data Science Skills In 2017: Identify Where to…

Lillian Pierson, P.E. 7 年前

“Pivot_table” can be used to filter and aggregate data. Any column can be used as an index to retrieve aggregated values for our analysis.

The following example uses “pivot_table” on the “Titanic” dataset:

The screengrab above shows that the column ‘Survived’ has been set as an index to count the number of people who survived and those who couldn’t survive across both genders.

Value_counts: This function is used almost like a ritual in every dataset along with functions like info() and describe(). Value_counts() can be used to find unique values and their count in each column.

In the screenshot above, the ‘for’ loop gives the unique count of every column value excluding the NA values by default.?

To_datetime: Oftentimes, most research-oriented surveys have datasets with the year, month, date, hour, and minutes in separate columns. To concatenate all these columns into a single date in ‘YYYY-MM-DD’ format, pd.to_datetime(df[[‘year_col’,’month_col’,’date_col’]]) can be used.?

The following screenshots explain the same:

Output:

Viraj Kumar

Product Manager at DECA Games

2 年

Thanks for sharing! Good to know about Range-Index :)

1 次回应

Neeraj Gunjan

2 年

Thanks for sharing

1 次回应

查看更多评论

要查看或添加评论，请登录

Asha Pondicherry的更多文章

Unlock the Sounds of the Past: Discover Top Music Charts from the Past Decades with my New Interactive Tool

2023年4月26日

Unlock the Sounds of the Past: Discover Top Music Charts from the Past Decades with my New Interactive Tool

Growing up, music was always an essential part of my life. From my childhood days to college, every milestone in my…

4 条评论
Boxplots Using Matplotlib

2022年7月25日

Boxplots Using Matplotlib

Boxplots are used to visualize the data distribution and compare the distribution amongst various categorical groups…

Few Essential Pandas Functions.

Asha Pondicherry

Data Analytics and Data Engineering

领英推荐

Asha Pondicherry的更多文章

社区洞察

其他会员也浏览了

Data Science Boom in Belgium: Are You Ready to Ride the Wave? ????

Top 10 Data Science Communities

Data Science requires heavy dose of statistics not less

So You Want a Job in Data Science? Start with the Right Mindset

Data Scientist 2.0, who they are ...and how we placed 50 in Scotland for The Data Lab.

Top Data Scientists Before There was Data Science

Era of the Data Scientist 2.0

10 things every aspiring data scientist needs to know

Navigating the Intersection of Data Science and Humanity: My T-Shaped Journey

Top 20 Data Science Websites To Follow in 2020.

领英推荐

Asha Pondicherry的更多文章

Unlock the Sounds of the Past: Discover Top Music Charts from the Past Decades with my New Interactive Tool

Boxplots Using Matplotlib

社区洞察

其他会员也浏览了

Data Science Boom in Belgium: Are You Ready to Ride the Wave? ????

Top 10 Data Science Communities

Data Science requires heavy dose of statistics not less

So You Want a Job in Data Science? Start with the Right Mindset

Data Scientist 2.0, who they are ...and how we placed 50 in Scotland for The Data Lab.

Top Data Scientists Before There was Data Science

Era of the Data Scientist 2.0

10 things every aspiring data scientist needs to know

Navigating the Intersection of Data Science and Humanity: My T-Shaped Journey

Top 20 Data Science Websites To Follow in 2020.