Pandas : Handling Data (DataFrame and Series)

Pandas : Handling Data (DataFrame and Series)

You have a big dataset ? looking to explore what the data is talking about ? Python Pandas library can help you.

Pandas help us transform the data (any kind of data) into series or DataFrame and that gives us an easy way of handling , manipulating and utilizing the data.

In the last article we discussed NumPy in details and in this one we are going to talk about another important library Pandas and its usage in data analysis and exploration.

Following topics are covered : Series/Dataframe/Handling Data operations/Functions/Sparkline/Basic operations like min/max/median/mean/datetime plus some other key topics

Why Pandas: 

Pandas provides an efficient way to slice the data and flexibility to merge, concatenate and reshape the data. Pandas provide DataFrame structure which is a 2 dimensional data with labels, something similar to sql or excel but more powerful and flexible in many scenarios.

·      Series : 1-dimensional array used to store any kind of data

·      DataFrame : Think of it as collection of series and a 2 D sheet or table.

 

Pandas vs Numpy :

Pandas provide 1 dimension with series and 2 dimension with DataFrame but on the other hand Numpy can handle multi-dimensional array (ndarray). Numpy is also proven to be memory efficient and also scores better in terms of fast indexing

Install Pandas:

pip install pandas

 Basic operations with Pandas:

Series

Creating 1 dimensional arrays

No alt text provided for this image
No alt text provided for this image

DataFrame

Creating a dataframe or collection of series (2 dimensional array)

No alt text provided for this image

Another way to use data from a location - Importing file from a location

No alt text provided for this image
No alt text provided for this image

Access single column

No alt text provided for this image

Applying operations on a specific column

No alt text provided for this image

Applying some basic functions like min/max/median/mean and other operations

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image


Some good examples form LinkedIn Posts about using Pandas

Sparkline :

https://github.com/iiSeymour/sparkline-nb/blob/master/sparkline-nb.ipynb

Render sparkline style charts in pandas dataframes

No alt text provided for this image

 Philip Vollet :

https://www.dhirubhai.net/feed/update/urn:li:activity:6798876165504401408/?updateEntityUrn=urn%3Ali%3Afs_feedUpdate%3A%28V2%2Curn%3Ali%3Aactivity%3A6798876165504401408%29

Handling Datetime and calculations

No alt text provided for this image

Khuyen Tran :

https://www.dhirubhai.net/feed/update/urn:li:activity:6798596663997997056/


Explore more : MultiIndex that allows working with higher dimension data by allowing to store and manipulate data with an arbitrary number of dimension to a lower dimension - Series or DataFrame

https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html

Cheat Sheet

No alt text provided for this image

I hope this article provides a basic understanding of how pandas can help in different types of data operations.

Provide your feedback to improve further. Happy Learning !



 

Shah Hussain

Solution Expert(RPA/IPA)

3 年

Thanks for sharing!

Mohsin Khan

Energy Digital I Artificial Intelligence I Intelligent Automation | Digital Transformation | PMP?/SixSigmaBlackBelt

3 年
回复
Mueen Khan

IIM Kashipur MBA '25 ||Ex-TCS||Ex-Wipro||JMI 2020|| Six Sigma Green Belt || Microsoft AI 900

3 年

????

Margorye C.

AI & Business Automation Consultant | SaaS

3 年

Thanks for sharing

Mohsin Khan

Energy Digital I Artificial Intelligence I Intelligent Automation | Digital Transformation | PMP?/SixSigmaBlackBelt

3 年
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了