NumPy (Python Library) Overview + Some code
Introduction of NumPy
NumPy (short for Numerical Python) is a powerful Python library for numerical computing. It provides an efficient and convenient interface for working with large multi-dimensional arrays and matrices of numerical data, as well as a wide range of mathematical functions for performing various operations on them.
Some key features of NumPy include:
In conclusion, NumPy is an essential library for any data scientist or programmer working with numerical data in Python. Its efficient and convenient interface for working with large multi-dimensional arrays and matrices, coupled with its wide range of mathematical functions and integration with other scientific computing libraries, make it a powerful tool for performing complex data analysis and visualization tasks.?
Memory refresher of some post-secondary level linear algebra
Matrix: In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions, arranged in rows and columns.
The size of a matrix: It is determined by its number of rows and columns. For example, a matrix with 3 rows and 4 columns is called a 3 x 4 matrix. To generalize this, an m x n matrix has m rows and n columns.
Special matrices
Matrix multiplication
In mathematics, matrix multiplication is a binary operation that takes two matrices and produces another matrix. the multiplication of two matrices A and B is defined only if the number of column of A is equal to the number of rows of B.
To compute the product of two matrices A and B, take the dot product of each row of A with each column of B, and sum of the result. The resulting matrix has the same number of rows as A and the same number of columns as B.
Eigenvalues & Eigenvectors
In linear algebra, eigenvalues and eigenvectors are associated with square matrices. An eigenvector of a matrix A is a non-zero vector v that, when multiplied by A, results in a scaler multiple of itself, that is Av=λv, where λ is a scaler called the eigenvalue corresponding to the eigenvector v.
Eigenvalues and eigenvectors have important applications in various fields such as physics, engineering, and data analysis. For example, in data analysis, eigenvalues and eigenvectors can be used in Principal Component Analysis (PCA) to identify the most important features in a dataset.
NumPy implementation in Python
All the notes/codes are based on Keith Galli's NumPy tutorial with minor modifications.
Before using NumPy, make sure to have it properly installed in the device. We can use python IDEs like PyCharm or Wing 101, but Jupyter Lab/Notebook might be a better option since we can run specific lines of code as we wish.
领英推荐
The Basics
import numpy as np
# Create an array
[in] a = np.array([1, 2, 3])
[in] print(a)
[out] [1 2 3]
# Create a 3 x 2 array with floating numbers
[in] b = np.array([[9.1, 8.2, 7.3], [6.4, 5.5, 4.6]])
[in] print(b)
[out] [[9.1 8.2 7.3]
[6.4 5.5 4.6]]
# Get the shape of the array.
[in] b.shape
[out] (2, 3) # -> i.e., b is a 2 x 3 array because it has 2
rows and 3 columns.
# Get the dimension of the array.
[in] b.ndim
[out] 2 # -> i.e., b is a 2-dimensional array since all
elements are arranged in rows and columns.
Accessing/Changing Specific Elements, Rows and Columns.
# Always remember to import NumPy beforehand.
import numpy as np
[in] a = np.array([1,2,3,4,5,6,7], [8,9,10,11,12,13,14])
# -> a is a 2 x 7 array with 2 rows and 7 columns.
# Access a specific element [r, c] (row r and column c)
[in] a[0, 5]
[out] 6
# Access a specific row
[in] print(a[0, :])
[out] [1 2 3 4 5 6 7]
# Access a specific column
[in] print(a[:, 2])
[out] [3, 10]
# Change a specific element in the array.
[in] a[1, 5] = 20
[in] print(a)
[out] [[1 2 3 4 5 6 7]
[8 9 10 11 12 20 14]]
# Change the whole column in the array
[in] a[:, 2] = [5, 6]
[in] print(a)
[out] [[1 2 5 4 5 6 7]
[8 9 6 11 12 20 14]]
# 3-dimensional array (i.e., Elements in the rows and columns
are arrays.)
[in] b = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
[in] print(b)
[out] [[[1 2]
[3 4]]
[[5 6]
[7 8]]]
# Get specific element in 3-dimensional arrays.
[in] print(b[0,1,1]) # First array; second row; second number
[out] 4
# Replace elements (similar to lower dimensional arrarys)
[in] b[1,0,1] = 10
[in] print(b)
[out] [[[1 2]
[3 4]]
[[5 10]
[7 8]]]
# Replace multiple elements at once
[in] b[:,:,0] = [[12, 24], [48, 96]]
[in] print(b)
[out] [[[12 2]
[24 4]]
[[48 10]
[96 8]]]
Initializing Different Types of Arrays
import numpy as np
# Zero matrix
[in] zero = np.zeros((3, 5)) # Creating a 3 x 5 zero matrix
[in] print(zero)
[out] [[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]]
# Here, we can change the data type to int8/int16/int32/int64 if
we'd like to get rid of the decimals.
# All 1s matrix
[in] one = np.ones((3, 5), dtype='int16') # Creating a 3 x 5 matrix
whose only element is 1 and setting the datatype to int16.
[in] print(one)
[out] [[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]]
# All "other number" matrix
[in] other_num = np.full((3, 5), 36, dtype='int16')
[in] print(other_num)
[out] [[36 36 36 36 36]
[36 36 36 36 36]
[36 36 36 36 36]]
# Use "full_like" to reuse the structure of an exisitng array.
[in] a = np.array([1,2,3,4,5,6,7], [8,9,10,11,12,13,14])
[in] b = np.full_like(a, 5)
[in] print(b)
[out] [[5 5 5 5 5 5 5]
[5 5 5 5 5 5 5]]
# Random decimal numbers
[in] c = np.random.rand(4, 3)
[in] print(c)
[out] [[0.2799959 0.0205946 0.0418946 ]
[0.79033484 0.49661239 0.66497414]
[0.77573664 0.29134228 0.95059255]
[0.24277333 0.97923166 0.32381059]]
# Random decimal numbers using the same shape of an existing array
[in] d = np.random.random_sample(a.shape)
[in] print(d)
[out] [[0.02079737 0.67971327 0.17267642 0.22453829 0.4227724 0.53196238
0.5366661 ]
[0.10864681 0.35630595 0.42413239 0.50289581 0.02159023 0.27639738
0.2568261 ]]
# Here, we are getting a 2 x 7 array with random decimal numbers.
# Random integer values
[in] e = np.random.randint(-4, 10, size=(3, 5)) # lowest value = -4,
highest value = 10 - 1 = 9
[in] print(e)
[out] [[ 3 -2 -2 6 5]
[ 0 4 -1 6 7]
[ 3 -1 4 -3 -3]]
# Create the identity matrix
[in] I1 = np.identity(5)
[in] print(I1)
[out] [[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
# Repeat an array
[in] arr = np.array([1,2,3], [4,5,6])
[in] r1 = np.repeat(arr, 3, axis=0)
# 3 refers to the number of repetitions we'd like for the array;
# axis=0 means it's the rows that we want repetition of.
# If we want the repetition of columns, make axis=1.
[in] print(r1)
[out] [[1 2 3]
[1 2 3]
[1 2 3]
[4 5 6]
[4 5 6]
[4 5 6]]
# Be careful when copying the array
[in] f = np.array([1,2,3])
[in] print(f)
[in] g = f
[in] print(g)
[in] g[0] = 100
[in] print(g)
[in] print(f)
[out] [1 2 3]
[1 2 3]
[100 2 3]
[100 2 3] # If we simply let g = f, when we change the
# element in g, the corresponding element in f will change as well.
# In order to prevent this from happening, use a copy function instead.
[in] f = np.array([1,2,3])
[in] print(f)
[in] g = f.copy() # Use copy function.
[in] print(g)
[in] g[0] = 100
[in] print(g)
[in] print(f)
[out] [1 2 3]
[1 2 3]
[1 2 3]
[100 2 3]
Mathematics
import numpy as np
# Basic arithmetic
[in] a = np.array([1,2,3,4])
[in] print(a)
[in] print(a+2)
[in] print(a-2)
[in] print(a*2)
[in] print(a/2)
[out] [1 2 3 4]
[3 4 5 6]
[-1 0 1 2]
[2 4 6 8]
[0.5 1. 1.5 2.]
[in] b = np.array([5,6,7,8])
[in] print(a + b)
[out] [6 8 10 12]
[in] print(a ** 2)
[out] [1 4 9 16]
# Basic trig functions: Sine & Consine
[in] print(np.sin(a))
[in] print(np.cos(a))
[out] [ 0.84147098 0.90929743 0.14112001 -0.7568025 ]
[out] [ 0.54030231 -0.41614684 -0.9899925 -0.65364362]
# We can verify this by using sin^2(a) + cos^2 (a) = 1
[in] print((np.sin(a))**2 + (np.cos(a))**2)
[out] [1. 1. 1. 1.]
# Linear Algebra
# NumPy has a matrix multiplication function:
[in] c = np.ones((2, 3))
[in] print(c)
[in] d = np.full((3, 2), 2)
[in] print(d)
[in] print(np.matmul(c, d))
[out] [[1. 1. 1.]
[1. 1. 1.]]
[[2 2]
[2 2]
[2 2]]
[[6. 6.]
[6. 6.]]
# We can also use NumPy to find the determinant of a matrix
[in] e = np.identity(5)
[in] print(e)
[in] print(np.linalg.det(e)) # The determinant of identity matrices
is always 1
[out] [[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
1.0
# Statistics
[in] stats = np.array([1,2,3],[4,5,6])
[in] print(stats)
[in] print(np.min(stats))
[in] print(np.max(stats))
[in] print(np.min(stats, axis=1))
[in] print(np.max(stats, axis=0))
# axis=0 is a parameter used in NumPy functions to specify that an operation
should be performed along the rows of an array.
[out] [[1 2 3]
[4 5 6]]
1
6
[1 4]
[4 5 6]
# Find the sum of the rows/columns
[in] print(np.sum(stats, axis=0))
[out] [5 7 9] # 1+4; 2+5; 3+6
Reorganizing Arrays
import numpy as np
# We can reorganize the existing arrays by doing the following:
[in] before = np.array([1,2,3,4], [5,6,7,8])
[in] print(before)
[in] print("The matrix named 'before' is a "+ str(before.shape) + " matrix.")
[in] after = before.reshape((4,2))
[in] print(after)
[in] print("The matrix named 'after' is a "+ str(after.shape) + " matrix.")
# Since we had 8 elements in total,
we have to make sure all 8 elements can be reassigned to some rows and
columns.
[out] [[1 2 3 4]
[5 6 7 8]]
The matrix named 'before' is a (2, 4) matrix.
[[1 2]
[3 4]
[5 6]
[7 8]]
The matrix named 'after' is a (4, 2) matrix.
# Vertically stacking vectors/matrices
[in] v1 = np.array([1,2,3,4])
[in] v2 = np.array([5,6,7,8])
[in] v3 = np.vstack([v1, v1, v2, v2])
[in] print(v3)
[in] print(v3.shape)
[out] [[1 2 3 4]
[5 6 7 8]
[1 2 3 4]
[5 6 7 8]]
(4, 4)
# Horizontally stacking vectors/matrices
[in] h1 = np.ones((2, 4))
[in] h2 = np.zeros((2, 2))
[in] print(h1)
[in] print(h2)
[in] h3 = np.hstack([h1, h2])
[in] print(h3)
[out] [[1. 1. 1. 1.]
[1. 1. 1. 1.]]
[[0. 0.]
[0. 0.]]
[[1. 1. 1. 1. 0. 0.]
[1. 1. 1. 1. 0. 0.]]
Miscellaneous
# We can use NumPy to load data stored in a text file.
import numpy as np
[in] filedata = np.genfromtxt('NumPy_testdata.txt', delimiter=',')
[in] filedata.astype('int32')
[in] filedata=filedata.astype('int32')
[in] print(filedata)
[in] print(filedata.shape)
[out] [[ 1 13 21 11 196 75 4 3 34 6 7 8 0 1 2 3 4 5]
[ 3 42 12 33 766 75 4 55 6 4 3 4 5 6 7 0 11 12]
[ 1 22 33 11 999 11 2 1 78 0 1 2 9 8 7 1 76 88]]
(3, 18)
# Boolean Masking and Advanced Indexing
[in] print(filedata > 25) # To check if every single element is greater than 25.
[out] [[False False False False True True False False True False False False
False False False False False False]
[False True False True True True False True False False False False
False False False False False False]
[False False True False True False False False True False False False
False False False False True True]]
# Create an array that contains only the elements greater than 50 in filedata.
[in] print(filedata[filedata>50])
[out] [196 75 766 75 55 999 78 76 88]
# We can index with a list in NumPy
[in] a = np.array([1,2,3,4,5,6,7,8,9])
[in] print(a[[1, 2, 8]]) # Here, we're indexing the second, third, and ninth element in the array.
[out] [2 3 9]
# Check whether a column contains any number > 50
[in] print(np.any(filedata>50, axis=0))
[in] print(np.all(filedata>50, axis=0))
[in] print(np.any(filedata>50, axis=1))
[in] print(np.all(filedata>50, axis=1))
[out] [False False False False True True False True True False False False
False False False False True True]
[False False False False True False False False False False False False
False False False False False False]
[True True True]
[False False False]
Exercise: Try to index the highlighted elements.
# Create the 5 x 6 matrix as required.
[in] exer = np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],
[16,17,18,19,20],[21,22,23,24,25],[26,27,28,29,30]])
[in] print(exer)
[in] print(exer.shape)
[out] [[ 1 2 3 4 5]
[ 6 7 8 9 10]
[11 12 13 14 15]
[16 17 18 19 20]
[21 22 23 24 25]
[26 27 28 29 30]]
(6, 5)
# Solution: Blue
[in] print(exer[2:4, 0:2])
[out] [[11 12]
[16 17]]
# Solution: Green
[in] print(exer[[0,1,2,3],[1,2,3,4]])
[out] [ 2 8 14 20]
# Solution: Red
[in] print(exer[[0,4,5], 3:5])
[out] [[ 4 5]
[24 25]
[29 30]]