Numpy
Shiva Shankar Moogi
Data Science Intern at Aware AI| Data Analyst Intern at Savant Technologies| MSc Big Data & Data Science Student at Northumbria University
What is Numpy ?
*Library used by python for scientific computing and Data Analysis *Supports arrays and wide range of mathematical operations and functions. *Provides features like linear Algebra,Statistical Functions, Random number generation etc...
Why is it Useful ?
*Supports Arrays *Mathematical and Statistical Functions *Fast and Efficient Computation *Integration with other libraries *Data Analysis *Scientific Computation *Large Data Manipulation
Numpy Array and Python Lists
Numpy Array Vs Python List
In?[8]:
! pip install numpy # To Install Numpy Library
Requirement already satisfied: numpy in c:\users\91879\anaconda3\lib\site-packages (1.23.5)
Numpy Arrays are multi dimensional structures that are used to store large amounts of Numerical data. It is a like a grid of homogenous values, whereas python list stores values of all data types like integer, Floats ,Strings etc.. We can say that It is a Collection of elements.
Lets create an Array and Python List
In?[9]:
import numpy as np
array = np.array([1,2,3,4,5])
list = [1,2,3,4,5]
print("Array : ",array)
print("List : ",list)
Array : [1 2 3 4 5]
List : [1, 2, 3, 4, 5]
Here as we have observed we got the same output right ? I knew you gotcha a question in your head then what is the difference between Array and Python List
Advantage of Numpy Array Over Python Lists
Numpy Array give a fast and efficient way of manipulating and working with Numerical data as compared to python list. Following factors make Numpy Arrays better than python list while dealing with neurical data:
So lets do one Example
In?[12]:
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])
print("Sum :",a+b)
#dot product
print("Dot Product :",np.dot(a,b))
Sum : [ 6 8 10 12]
Dot Product : 70
Lets determine the memory Usage of both
In?[15]:
import sys
Array = np.array([1,2,3,4])
Lists = [1,2,3,4]
print("Memory Usage of Array :", Array.nbytes)
print("Memory usage of Lists :",sys.getsizeof(Lists))
Memory Usage of Array : 16
Memory usage of Lists : 88
Compatability with Other Libraries
Numpy arrays are widely used in Scientific computing and Machine Learning, and Many other libraries such as Pandas and Tensorflow support numpy arrays.
This Means that you can easily integrate Numpy arrays into your Workflow
When to Use What ?
Numpy Arrays are better suited for numerical computations and that require vectorization (Operations on Array of Numbers). It provides a more efficient and Convenient representation of numerical data compared to python lists.
on the Other hand, It doesn't mean that we should not use lists instead of Numpy arrays. We can use lists while we are working with less number of data. Lists are more flexible an can store a variety of data types,including non - numerical like strings and objects.
Creation of Arrays
There are two main general mechanism of creating arrays.
Lets create an Array
In?[22]:
import numpy as np
#converting list into Numpy Array
List = [1,2,3,4]
Arr = np.array(List)
print("List into Arr :", Arr)
#converting Tuple to Numpy Array
,
Tuple = (1,2,3)
ARR = np.array(Tuple)
print("Tuple into Arr : ",ARR)
List into Arr : [1 2 3 4]
Tuple into Arr : [1 2 3]
#Lets Create 2d Array
In?[23]:
array2 = np.array([[1,2,3],[4,5,6]])
array2
Out[23]:
array([[1, 2, 3],
[4, 5, 6]])
In?[24]:
#lets Create 3d array
In?[26]:
array3 = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
array3
Out[26]:
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
In?[28]:
array3.ndim # T0 check how many dimensions are there
Out[28]:
3
np.zeros : Creates an Array filled with Zeros np.ones
In?[29]:
np.zeros((2,4))
Out[29]:
array([[0., 0., 0., 0.],
[0., 0., 0., 0.]])
np.ones : Creates an Array filled with Ones
In?[30]:
np.ones((3,4))
Out[30]:
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
np.arange : creates an Array with specified range of values in a Interval
In?[36]:
a = np.arange(2,10) # Here we have started the Parameters
a
Out[36]:
array([2, 3, 4, 5, 6, 7, 8, 9])
In?[35]:
a = np.arange(2,10,2) #Here we have given interval of 2
a
Out[35]:
array([2, 4, 6, 8])
np.linspace : creates an Array with a specified number of logarthmically spaced values within a specified intervals.
In?[39]:
np.linspace(1,5,10)
Out[39]:
array([1. , 1.44444444, 1.88888889, 2.33333333, 2.77777778,
3.22222222, 3.66666667, 4.11111111, 4.55555556, 5. ])
In this example, start is 1, stop is 5, and num is 10. Therefore, np.linspace generates an array with 10 equally spaced values between 1 and 5 (inclusive)
np.full() : Creates an Array of specified shape and fills it with a specified value. np.full is useful when you want to create an array of a specific shape and initialize it with a constant value.
In?[40]:
np.full((3,4),6)
Out[40]:
array([[6, 6, 6, 6],
[6, 6, 6, 6],
[6, 6, 6, 6]])
np.eye() : Creates an Identify matrix, which is a square matrix with ones on the diagnol and zeros elsewhere.
In?[41]:
np.eye(3,4)
Out[41]:
array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.]])
np.empty : Here we will get the Random Values
In?[46]:
empty = np.empty([2,3],dtype = int)
empty
Out[46]:
array([[1053929728, 573, 1053928128],
[ 573, 1053922176, 573]])
Basic Operations
Here We will do some basic operations Including Addition, Subtraction, multlipliaction etc..
Arithmetic Operations
In?[50]:
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([3,4,5,6,7])
#addition
print(a+b)
#subtraction
print(a-b)
#Multiplication
print(a*b)
#Division
print(a/b)
#modulo
print(a%b)
#exponentiation
print(a**b)
#Matrix Multiplication
c = np.array([1,2,3])
d = np.array([4,5,6])
print(np.dot(c,d))
[ 4 6 8 10 12]
[-2 -2 -2 -2 -2]
[ 3 8 15 24 35]
[0.33333333 0.5 0.6 0.66666667 0.71428571]
[1 2 3 4 5]
[ 1 16 243 4096 78125]
32
#Trignometric Functions
If we want to get the trignometric Functions. Firstly, we need to change degree to Radians. After that again we need to Convert the degree Values to radians. Lets see how it will be
In?[52]:
trig1 = np.array([0,30,45,60,90])
trig2 = np.deg2rad(trig1)
In?[55]:
print("Cos Values is : ",np.cos(trig2))
print("Sin Values is : ",np.sin(trig2))
Cos Values is : [1.00000000e+00 8.66025404e-01 7.07106781e-01 5.00000000e-01
6.12323400e-17]
Sin Values is : [0. 0.5 0.70710678 0.8660254 1. ]
#Logarthmic Functions
In?[56]:
print(np.log(a))
print(np.log10(a))
[0. 0.69314718 1.09861229 1.38629436 1.60943791]
[0. 0.30103 0.47712125 0.60205999 0.69897 ]
#Bitwise Operators
In?[57]:
bit1 = np.array([1,2,3,4],dtype = np.uint8)
bit2 = np.array([5,6,7,8],dtype = np.uint8)
In?[58]:
print("Bitwise and :", np.bitwise_and(bit1,bit2))
Bitwise and : [1 2 3 0]
In?[59]:
#Simultaneously, We can check Bitwise_Or as well
print("Bitwise Or :", np.bitwise_or(bit1,bit2))
Bitwise Or : [ 5 6 7 12]
Comparison Operators
In?[61]:
print(a==b)
print(a>b)
print(a<b)
print(a!=b)
print(a>=b)
print(a<=b)
[False False False False False]
[False False False False False]
[ True True True True True]
[ True True True True True]
[False False False False False]
[ True True True True True]
NOTE : Its not a element check and completely Array check
Aggregate Functions
In?[74]:
#sum
print(np.sum(a))
#mean
print(np.mean(a))
#median
print(np.median(a))
#Standard deviation
print(np.std(a))
#Variance
print(np.var(a))
#Maximum
print(np.amax(a))
#Minimum
print(np.amin(a))
#sorting Operations
print(np.sort(a))
15
3.0
3.0
1.4142135623730951
2.0
5
1
[1 2 3 4 5]
Indexing and Slicing
Indexing and Slicing are fundamental concepts in working with Arrays in Numpy. Indexing is the process of accessig individual elements of an array. Slicing is the process of accessing a subarray or a subset of elements with an Array.
In?[75]:
#Indexing
import numpy as np
a = np.array([1,2,3,5,7,9])
a
Out[75]:
array([1, 2, 3, 5, 7, 9])
In?[76]:
a[3] # Index always starts from '0'.
Out[76]:
5
In?[77]:
b = np.array([[1,2],[3,4]])
b
Out[77]:
array([[1, 2],
[3, 4]])
In?[78]:
b[0][1] # Here [1,2] refers to 0 Index and [3,4] refers to First Index Number
Out[78]:
2
In?[80]:
#A Boolean Index
#Filtering the Values, based upon desire result.
x = np.array([1,2,3,4,1,18,19])
x[x!=1]
x=x-1
x
Out[80]:
array([ 0, 1, 2, 3, 0, 17, 18])
In?[87]:
#Slicing
c = np.array([1,2,3,4,5])
c[2:4:1]
Out[87]:
array([3, 4])
Reshaping, Splitting and Stacking of Array
These Opeartions allow us to modify the shape and structure of array, and it can be useful in a variety of situations in data preprocessing and analysis. Reshaping, splitting and stacking arrays are important in Numpy that enable us to perform a wide range of computations and data manipulations.
Reshaping : It is the process of converting an Numpy array into a different shape without changing its data.
Splitting : It is the Process of diving array into smaller arrays along one or more axes.
Stacking : It is the process of combining two or more numpy Arrays into a Single Array along a new Axis
These three operations are very Important for Data Analyis and Data Transformation
Reshaping
In?[92]:
a = np.array([[1,2,3,4],[5,6,7,8]])
a
Out[92]:
领英推荐
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
In?[95]:
b = a.reshape(4,2)
b
Out[95]:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
Here while reshaping, Please make sure that it should level the size, which is nothing but numbers in simple Mathematical Terms
Splitting
In?[97]:
a
Out[97]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
In?[100]:
b = np.split(a,2)
In?[101]:
b
Out[101]:
[array([[1, 2, 3, 4]]), array([[5, 6, 7, 8]])]
Stacking
In?[106]:
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])
c = np.vstack((a,b))
d = np.hstack((a,b))
print(c)
print(d)
[[1 2 3 4]
[5 6 7 8]]
[1 2 3 4 5 6 7 8]
Concatenation
In?[113]:
a = np.array([1,2,3])
b = np.array([7,8,9])
c = np.concatenate((a,b),axis =0)
c
Out[113]:
array([1, 2, 3, 7, 8, 9])
all the input arrays must have same number of dimensions
Transpose
In?[116]:
a = np.array([[1,2,3],[3,4,5],[6,7,8],[9,1,2]])
a
Out[116]:
array([[1, 2, 3],
[3, 4, 5],
[6, 7, 8],
[9, 1, 2]])
In?[122]:
a.shape
Out[122]:
(4, 3)
In?[118]:
a.ndim
Out[118]:
2
In?[127]:
b = a.transpose()
b
Out[127]:
array([[1, 3, 6, 9],
[2, 4, 7, 1],
[3, 5, 8, 2]])
In?[126]:
b.shape
Out[126]:
(3, 4)
BroadCasting
BroadCasting is a feature in the Numpy Library in python that allow for arithmetic operations between array of different shapes
The smaller Array is broadcasted across the larger array so that they have Compatible Shapes. Broadcasting solves the problem of compatiblity in shape between arrays of different dimensions during arithmetic operations.
Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of python
In?[131]:
a = np.array([1,2,3,4,5,6])
b = 2.0
c = a*b
c
Out[131]:
array([ 2., 4., 6., 8., 10., 12.])
Generic Rules of Broadcasting
In?[142]:
a = np.array([1,2,3,4])
b = np.array([[1],[2],[3]])
c = a*b
c
Out[142]:
array([[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]])
In?[140]:
import numpy as np
# Broadcasting example
a = np.array([[1, 2, 3]]) # Shape: (1, 3)
b = np.array([[4], [5]]) # Shape: (2, 1)
# Attempting to broadcast arrays with incompatible shapes
result = a + b # Raises a ValueError
In?[141]:
result
Out[141]:
array([[5, 6, 7],
[6, 7, 8]])
The shape of array a is (1, 3), which means it has one row and three columns. The shape of array b is (2, 1), indicating two rows and one column.
Now, let's compare the sizes of the dimensions:
For array a, the first dimension (rows) has a size of 1. For array b, the first dimension (rows) has a size of 2. The first dimension sizes are not the same, but according to the broadcasting rules, they can still be compatible. This is because one of the dimensions has a size of 1, which satisfies the broadcasting criteria.
The second dimension (columns) is where the sizes are being compared:
For array a, the second dimension (columns) has a size of 3. For array b, the second dimension (columns) has a size of 1. In this case, the size of the second dimension in array b is 1, which satisfies the broadcasting criteria. The broadcasting rules allow this dimension to be extended to match the size of the corresponding dimension in array a, which is 3.
Therefore, the sizes of the second dimension in a and b do match the broadcasting criteria, and broadcasting can be performed.
Limitations:
Plotting Numpy Arrays
Data Visualization is the representation of data in a graphical form. which makes it easier to understand and analyze large and complex datasets
In?[1]:
#Lets Code it
!pip install matplotlib.pyplot as plt
!pip install seaborn as sb
ERROR: Could not find a version that satisfies the requirement matplotlib.pyplot (from versions: none)
ERROR: No matching distribution found for matplotlib.pyplot
Requirement already satisfied: seaborn in c:\users\91879\anaconda3\lib\site-packages (0.12.2)
ERROR: Could not find a version that satisfies the requirement as (from versions: none)
ERROR: No matching distribution found for as
In?[2]:
import matplotlib.pyplot as plt
import seaborn as sb
import numpy as np
Matplotlib
In?[8]:
#Lineplot
x = np.linspace(1,20,5)
y = np.sin(x)
plt.plot(x,y)
plt.xlabel("X - Axis")
plt.ylabel("Y - Axis")
plt.title("Line Plot")
plt.show()
In?[16]:
#scatterplot
x = np.random.rand(100)
y = np.random.rand(100)
plt.scatter(x,y)
plt.xlabel("X - Axis")
plt.ylabel("Y - Axis")
plt.title("Scatter Plot")
plt.show()
In?[18]:
#histogram
data = np.random.normal(100,20,1000)
plt.hist(data,color = "black")
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.title("Histogram")
plt.show()
The first argument, 100, is the mean of the normal distribution. The second argument, 20, is the standard deviation of the normal distribution. The third argument, 1000, is the number of random numbers to generate.
In?[20]:
#Barchart
x = ['A','B','C','D']
y = [1,2,3,4]
plt.bar(x,y)
plt.xlabel("X-Axis")
plt.ylabel("Y-Axis")
plt.title("Bar Chart")
plt.show()
In?[22]:
#Pie chart
labels = ['A','B','C','D']
Values = [10,20,30,40]
plt.pie(Values,labels = labels,startangle=90,autopct = '%1.1f%%')
plt.axis('equal')
plt.xlabel("X- Axis")
plt.ylabel("Y -Axis")
plt.title("Pie Chart")
plt.show()
In?[23]:
#boxplot
data = [np.random.normal(50,10,100),np.random.normal(80,20,100),np.random.normal(40,15,100)]
plt.boxplot(data)
plt.xlabel("Categories")
plt.ylabel("Values")
plt.title("Boxplot")
plt.show()
Seaborn
In?[26]:
#lineplot
x = np.linspace(1,20,5)
y = np.sin(x)
sb.lineplot(x=x,y=y)
plt.show()
In?[28]:
#pairplot
import pandas as pd
data = np.random.randn(100,5)
df = pd.DataFrame(data,columns = ["A","B","C","D","E"])
sb.pairplot(df)
plt.suptitle("Pair plot of the Data")
plt.show()
In?[29]:
# heat Map
data = np.random.random((10,10))
sb.heatmap(data)
plt.show()
In?[30]:
#Violin Plot
data = np.random.normal(0,1,100)
sb.violinplot(data)
plt.show()
Handling with Numpy Array
I\o handling refers to the process of inputting and outputting data to and from a computer system.This Includes reading data
from a variety of sources, such as files or databases and writing data to different types of storage such as Harddrives or cloud storage
I\o Handling is a cruical ascept computer programming as it allows pprogram to Interact with the outside world and manipulate data
Numpy Provides several functions for I\o handling
In?[33]:
#saving data to Text file
data = np.array([[1,2,3,4,5],[6,5,4,3,2]])
np.savetxt("Data.txt",data,delimiter = ",")
In?[34]:
#loading data to Binary File
loaded_data = np.loadtxt("datadoc.txt",delimiter = ",")
np.save("data.npy",data)
loaded_data = np.load("data.npy")
In?[35]:
#Compressing Arrays for Efficient Storage
np.savez_compressed("data_compressed.npy",data)
Note : It will be saved in Our Local system please check
Masking
Masking of Arrays in Numpy Involves creating a Boolean Mask(An array of the same shape of the Original Array, with each element being either True or False) to filter out or mmanipulate specific elements based on a condition, by using the mask to index the original Array.
In?[42]:
import numpy as np
arr = np.array([1,2,3,4,5,5,7,8,9])
Mask = arr<5
Result = arr[Mask]
Result
Out[42]:
array([1, 2, 3, 4])
Structured Arrays
Structured Ararys are nd arrays whose datatype is a composition of similar datatypes organized as a sequence of named fields
In?[48]:
dt = np.dtype([('name','S10'),('age','i4'),('height','f8')])
In?[58]:
# Create with Structured Array
people = np.array(([('John',32,1.83),('Shiva',28,1.62)]),dtype = dt)
In?[59]:
person = people[0]
print(person)
name = person['name']
print(name)
age = person['age']
print(age)
height = person['height']
print(height)
(b'John', 32, 1.83)
b'John'
32
1.83
In?[60]:
# Calculating the Average height of all people
avg_height = people['height'].mean()
older_than_30 = people[people['age']>30]
In?[61]:
print(older_than_30)
[(b'John', 32, 1.83)]
Sorting Structured Arrays
In?[63]:
Sorted_peoples = np.sort(people,order ='age')
#Modifying Fields
people['age'] +=1
#Change the height of the First Person
people[0]['height']=1.90
print(people[0]['height'])
1.9
Thank you for watching. I hope this tutorial would be very helpful for you guys ! Please check Numpy Documentation to know more. Thank You!