Python For DataScience!

Python For DataScience!

There are a lot of programming languages for data science! Why specifically python? Because it has lots of Pre build libraries and functions present inside of python, it makes it easier to code compared to any other programming language.

Intro to Python:

We often use Py for data science and other related AI areas, because it has lots of libraries and more pre-built functions that make us more productive.

* In Data Science, we use more libraries like Numpy, Pandas, Seaborn, SciPy, and more.

* We see some basic concepts of python using Numpy and Pandas.

* Those are all very useful in future engineering part.

Pandas:

No alt text provided for this image


* Pandas is an inbuilt library present in python.

* Pandas helps to do more feature engineering parts, most of the parts in data science work with pandas.

* It is the most necessary library for data science.

* We will see some of the productive topics in pandas.


1. Indexing Technique:

* Helps to set the index for Data Frame.

import pandas as pd # Importing pandas from python 

# Creating a data 

pe = {"aravind": [12,13,14,15,15], "ara": [12,13,14,15,15],"evan":[12,131,41,51,21],"rosy": [12,31,24,25,43]}

df = pd.DataFrame(pe,index = ["english","maths","physis","botany","biology"] )

# Indexing Technique 

sc = df.set_index('aravind') 
print(sc) 
sc = df.sort_index('aravind') # sort the index
print(sc) 
sc = df.reset_index('aravind') # It reset all to orginal form
pring(sc)         

Output:

No alt text provided for this image

2. Indexing Location:

* Helps to retrieve the data using Index.

a = sc.iloc[0:3,:]?# Before comma it rows, after columns
print(a) 
# Iloc - Indexing Location

# 0:3 - it will retrieve first 3 rows. 
# : - It retrieves all the columns.

# Indexing starts from 0 and end with n numbers we have in data. 
# you can extract what you want using indexing 

# some of the examples 
b = sc.iloc[0:9, :] 
print(b) 

c = sc.iloc[0:3,1:3] 
print(c) 

# Copy the code and try it in local!        

Output:

No alt text provided for this image

3. Filtering Technique:

* Helps to filter the data frame, the main part of future engineering.

# Creating a Data 

pe = {"english": [12,13,14,15,15], "tamil": [12,13,14,15,15],"maths":[12,131,41,51,21],"science": [12,31,24,25,43],"results":["pass","fail","pass","fail","pass"]

fil = pd.DataFrame(pe,index = ["aravind","raghul","robo","bot","fate" ])}  

aa = fil['results'] == 'pass' 

fil[aa] # It helps to change all the data to 'pass' 
print(fil[na]) 

fil[~aa]# if you want negative , you need to use this symbol
print(fil[~aa])

fil[ab]#this is the filtering method?

        
No alt text provided for this image

4. Add and remove rows and columns:

* Helps to add and remove columns | rows.

# Creating a Data
ne = {'firstname ' : ['aravind','kowshik','dubuke'], 'lastname' : ['r','vasu','kumar'],'favoritecolor':['red','blur','green']
nes = pd.DataFrame(ne)} 

# Adding new columns using existing features
nes['fullname'] = nes['firstname ']+ " " +nes['lastname']
print(nes['fullname']) 

# Dropping the columns
nes.drop(columns = ['firstname ','lastname'])#if you want to drop two things you #need to give in list
        

5. GroupBy:

* It helps to group the data by columns and rows.

* Helps to retrieve the data properly.

# Data we have created before 

data.groupby("make").mean()

# Groupby('make') - Grouping data['make'] column by mean

data.groupby(['make','price']).mean() #we can do more functions here like std, var


        

6. Reading Multiple Files:

* Helps to read multiple files like (CSV, Excel).

import pandas as pd

data = pd.read_csv('data.csv') #Helps to read csv files 
# you need to mention extension ( csv ) 

data = pd.read_excel('data.xlsx') #Helps to read excel files.

data = pd.read_json('data.json') # Helps to read json files.

data = pd.read_hdf('data.h5') # Helps to read h5 files ( usefull for dl ) 

# You can real multiple files like this but just add extension of the file        

7. Write Files:

* Helps to convert your data to any other files.

import pandas as pd

data.to_json() # convert your file to json 

data.to_csv() # convert your file to csv 

data.to_html('html_page') #you can give name to file 

data.to_pickle('pickle_data') # you can convert the data to pickle file easily        

8. Finding Missing Values:

* It helps to find missing values.

import pandas as pd

data = pd.read_csv('data.csv') 

data.isnull() # It gives Boolean values True -having missing valules and visevera

data.isnull().sum() #It helps to find which feature having how many missing values

data.isna().sum() # Alternate Method to check missing values        

9. Handling Missing Values:

* It helps to manage missing values.

* Handling missing values needs more statistical analysis, but here we will see some general functions!

import pandas as pd
data = pd.read_csv('data.csv') 

data.isna().sum() #Finding missng values

data.dropna() # It will drop the null values by rows. 

data.fillna(34) # It will fill the null values using '34' number, you can give any 
number

        

10. Suffling:

* Shuffle makes your model works productive.

* It means it shuffles all the data from the population, and it will give shuffled data.

import pandas as pd
data = pd.read_csv('data.csv')

data.sample(frac = 0.8) 

# sample - shuffle 
# frac - How many percentage of sample you need!         


Numpy:

No alt text provided for this image


It's time for Numpy!

* Numpy -?Numerical?Python.

* In NumPy we use array format and matrix formats.

We see some similarities between the list and array.

* Array?Contains only similar data types. We do mathematical computations faster than lists and have less memory.

*?List?can contain any data type. We can't do any mathematical operations, and it occupies high memory compared to an array.

What is an array?

It's a collection of elements of the same type.

1. Create an array!

import numpy as np # importing from python library!

#let's create an array() 

a = np.array([1,2,3,4]) # one dimensional array.

b = np.array([[1,2,3,4],[1,2,3,4]]) # two dimensional array.[array inside an array]?

c = np.array([1,2,3,4], dtype = 'float') # we can change the dtype using this parameter.

d = np.array((1,2,3,4))# we can use tuple also to creat an array.

e = np.array([[[1,2,3,4],[1,2,3,4]]])# Three dimensional array.[ 2D array within an array]?)        

2. Arange function:

* A- Array, range - Range function.

* It will return evenly spaced numerical values with intervals.

import numpy as np 
# let's create an arange()


a1 = np.arange(1,10) # 10 is exclusive so its take only 9 numbers.
b1 = np.arange(1,10,3)? # 3 is step, it returns step wise.
c1 = np.arange(1,12,dtype = 'float') # we have data type aslo.)        

3, Ones and Zeros:

* It creates an array filled with zeros.

# let's creata a zeros()
import numpy as np 

a2 = np.zeros(1,dtype = 'int') # default data type is float.
b2 = np.zeros([2,3], dtype = 'int') # in list we can give the dimensions 2rows 3 columns.)

# let's create a ones()


a3 = np.ones([3,3], dtype = 'int'))        

4. Linspace:

* Linspace - Linearaly Space

* It creates an array of evenly spaced values, it's almost like an arange function().

# lets create linear space


a3 = np.linspace(1,100, num = 5) # evenly spaced is nothing but diffence between all the output's are same.

b3 = np.linspace(1,100, 5, endpoint = False) # if you don't want the ending number, we use this.

c3 = np.linspace(1,100,6,retstep = True)# return step will give you evenly spaced number.

d3 = np.linspace(1,129,3,dtype = 'int')# we use the dtype here also.

d3 = np.linspace(1,35)# if you don't mention num it will return 50 numbers, num default value is 50..        

5. Random:

* Return a random array.

* In random, we have four types.

i) rand - uniformly distributed values.

ii) randn - (n) Normally distributed values.

iii) randint - uniformly distributed integers in a range.

iv) ranf - uniformly distributed floating-point numbers.

import numpy as np 

# let's do rand function


a5 = np.random.rand(5,5) # rows and columns it gives random numbers from unifrom distribution.

# let's do randn


b5 = np.random.randn(5,5) #it return random values with help standard normal distribution.?

# let's do rand


c5 = np.random.ranf(5)# it will give only float numbers with the rows

# let's do randint


d5 = np.random.randint(1,10,size = [5,4])# here end point is exclusive.        

6. Attributes:

* It helps to check how many dimensions, shapes, and data types we have in our array.

# if you want to check how many dimension in our data we use the ndim array


e5 = np.ndim(e)#if you want to check the shape of the array use shape(


f5 = np.shape(e)# It will give the shape of our array

# if you want to find the data type of the variable use this


g5 = e.dtype
g5.        

7. Operations:

* It helps to do mathematical operations.

a8 * 2 # Multiply the array. 

a8 ** 2 # Power 

a8 / 3?# Dividing 

a9 = np.array([1,2,3]
b9 = np.array([1,2,3]))

ab = a9 + b9 or np.add(a9, b9) 

# we can use some internal functions like 

np.sum(ab) 
np.max(ab) 
np.min(ab) 
np.std(ab) 
np.var(ab)         

8. Broadcasting

* It helps to do operations with different sizes of arrays.

  • It has some rules. Array dimensions should be equal.
  • Otherwise, at least one data contains one dim.

z = np.array([[2],[3],[5]]
y = np.array([1,2,3]))

z + y         

9. Manipulation:

* Help to manipulate the data. We have a cool function for manipulation in NumPy.

# Reshape 

np.reshape(a9,(3,1)) #first is array name and second is (rows and columns)?

# Resize 

np.resize(a10,(3,3))

a10.resize((3,4),refcheck = False)        

10. Argmax and Armin:

* Argument max and argument min.

* It gives the index of maximum numbers and index of minimum numbers.

a10.argmax()

a10.argmin()        


Hope you gained new knowledge!

Thank you!

Name:?R. Aravindan

Position:?Content Writer.

Company:?Artificial Neurons.AI


Explore!

Importance of Outliers

Introduction to Normalization

要查看或添加评论,请登录

Artificial Neurons.AI的更多文章

社区洞察

其他会员也浏览了