Pandas For Data Analysis in Python
Riya Patadiya
Frontend Web Developer | ReactJS | Redux | ThreeJs | React-three-fiber | JavaScript | HTML5 | CSS | Bootstrap | React CoreUI |
Hello every one, here we come for the tech-savvy person who loves to code !!!
So here we come with Pandas library of Python Programming Language. Pandas are generally used for data analysis. Data Analysis is used to understand and summarize the content of data-set. Data analysis is the way to extract a data and discover it by mathematical terms. so that pandas are very useful python programming library for that. There are many ways to understand and analyze the same data but it might be slightly time-consuming and its lines of code would be more. It will be more tough for us to directly go through each data and do some mathematical kinds of stuff for each and every terms. but now, pandas gives us rich functionalities and methods to do it fast and in an easy way which can reduce lines of code too. so here we start from installation.
Ubuntu 16.04 Python 2.7
sudo apt install python-pip
So after installation, we will do a slightly small comparison between ‘ With Pandas & Without Pandas ‘ code. I will show you both typed code to make you a better understanding.
First, we have to import our installed python pandas library.
For ‘ With Pandas ’.
import pandas as pd
For ‘ Without Pandas ‘, we have to import CSV python inbuilt library.
import csv
we are using here bellow CSV file to work on pandas.
Okay. so we will say #1 for ‘ With Pandas ’ and #2 for ‘ Without Pandas ’.
#1 open file.
pd_read = pd.read_csv('gini.csv')
#2 open file.
csv_read =open('gini.csv', 'rb')
reader = csv.reader(csv_read)
#1 to get heads (column names) of Data.
print(list(pd_read))
#2 to get heads (column names) of Data.
header = reader.next()
print(header)
#1 get specific column’s data
print(pd_read.age)
#2 get specific column’s data
for i in reader:
print(i)
#1 to do some arithmetic operation.
pd_age = pd_read.age > 25
pd_result = pd_read.result == 'yes'
print(pd_read.loc[pd_age & pd_result])
#2 to do some arithmetic operation.
here, j will give you data of all columns, so we have to use j[0], j[1] … to get a specific column. (boring)
for k in reader:
if k[0] > '25' and k[3] == 'yes': #k[0] for age,k[3] for result
print(k)
if you have counted, #1 has 7 lines of code & #2 has 10 lines of code. now you decide which will you choose ? of course mathematical operation will be more confusing in #2 afterward.
I have shown you a basic code of pandas python. There are much more to do with data-set in pandas library like arithmetic operations( >,<,≤,≥,==,!= ), mean, max, min, etc. This all can make your stuff easy and pretty handy.
Other than that, we can use data visualization python library called matplotlib and seaborn to visualize our data in graphic form. so here i have done on our data-set column ‘speciality’, which is real usage of data analysis to get business insight.
I always welcome structured criticism. so comment and ask.
Don’t forget to Like & Share. if you like it.