NumPy (Python Library) Overview + Some code

NumPy (Python Library) Overview + Some code

Introduction of NumPy

NumPy (short for Numerical Python) is a powerful Python library for numerical computing. It provides an efficient and convenient interface for working with large multi-dimensional arrays and matrices of numerical data, as well as a wide range of mathematical functions for performing various operations on them.

Some key features of NumPy include:

  1. Multi-dimensional arrays: NumPy arrays can be used to represent multi-dimensional arrays and matrices of numerical data. These arrays are more efficient than Python's built-in lists, as they are stored in contiguous blocks of memory and allow for faster element access and manipulation. Functions in NumPy are provided for creating arrays of various shapes and sizes, including ones, zeros, random numbers, and identity matrices.
  2. Mathematical functions: NumPy provides a wide range of mathematical functions for performing various operations on arrays, including basic arithmetic, trigonometry, linear algebra, and statistics. These functions are optimized for performance and can be used to perform complex calculations efficiently.
  3. Broadcasting: NumPy arrays support broadcasting, which allows you to perform arithmetic operations between arrays of different shapes and sizes. This can simplify our code and make it more efficient by avoiding the need for explicit loops or nested function calls.
  4. Integration with other libraries: NumPy integrates with many other scientific computing libraries in Python, including Matplotlib, SciPy, and Pandas. This makes it easy to combine the capabilities of these libraries to perform complex data analysis and visualization tasks.
  5. Linear algebra: NumPy provides a wide range of functions for performing linear algebra operations, including matrix multiplication, eigenvectors, and singular value decomposition.
  6. FFT: NumPy provides functions for performing Fast Fourier Transforms (FFT) on arrays, which are commonly used in signal processing and image analysis.

In conclusion, NumPy is an essential library for any data scientist or programmer working with numerical data in Python. Its efficient and convenient interface for working with large multi-dimensional arrays and matrices, coupled with its wide range of mathematical functions and integration with other scientific computing libraries, make it a powerful tool for performing complex data analysis and visualization tasks.?


Memory refresher of some post-secondary level linear algebra

Matrix: In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions, arranged in rows and columns.

The size of a matrix: It is determined by its number of rows and columns. For example, a matrix with 3 rows and 4 columns is called a 3 x 4 matrix. To generalize this, an m x n matrix has m rows and n columns.

Special matrices

  • Identity matrix: An identity matrix is a square matrix (i.e., m x m, m Z+) with ones on the main diagonal and zeros elsewhere.
  • Zero matrix: A zero matrix is a matrix in which all the elements are zero.
  • Diagonal matrix: A diagonal matrix is a matrix in which all the non-diagonal elements are zero.
  • There are more, but for the sake of NumPy learning, they are not too important.

Matrix multiplication

In mathematics, matrix multiplication is a binary operation that takes two matrices and produces another matrix. the multiplication of two matrices A and B is defined only if the number of column of A is equal to the number of rows of B.

To compute the product of two matrices A and B, take the dot product of each row of A with each column of B, and sum of the result. The resulting matrix has the same number of rows as A and the same number of columns as B.

No alt text provided for this image
Source: https://www.basic-mathematics.com/multiply-matrices.html

Eigenvalues & Eigenvectors

In linear algebra, eigenvalues and eigenvectors are associated with square matrices. An eigenvector of a matrix A is a non-zero vector v that, when multiplied by A, results in a scaler multiple of itself, that is Av=λv, where λ is a scaler called the eigenvalue corresponding to the eigenvector v.

Eigenvalues and eigenvectors have important applications in various fields such as physics, engineering, and data analysis. For example, in data analysis, eigenvalues and eigenvectors can be used in Principal Component Analysis (PCA) to identify the most important features in a dataset.


NumPy implementation in Python

All the notes/codes are based on Keith Galli's NumPy tutorial with minor modifications.

Before using NumPy, make sure to have it properly installed in the device. We can use python IDEs like PyCharm or Wing 101, but Jupyter Lab/Notebook might be a better option since we can run specific lines of code as we wish.

The Basics

import numpy as np 
# Create an array
[in]  a = np.array([1, 2, 3])
[in]  print(a)
[out] [1 2 3]

# Create a 3 x 2 array with floating numbers
[in]  b = np.array([[9.1, 8.2, 7.3], [6.4, 5.5, 4.6]])
[in]  print(b)
[out] [[9.1 8.2 7.3]
       [6.4 5.5 4.6]]

# Get the shape of the array. 
[in]  b.shape
[out] (2, 3) # -> i.e., b is a 2 x 3 array because it has 2 
rows and 3 columns.

# Get the dimension of the array.
[in]  b.ndim 
[out] 2 # -> i.e., b is a 2-dimensional array since all 
elements are arranged in rows and columns.         

Accessing/Changing Specific Elements, Rows and Columns.

# Always remember to import NumPy beforehand. 
import numpy as np
[in]  a = np.array([1,2,3,4,5,6,7], [8,9,10,11,12,13,14])
# -> a is a 2 x 7 array with 2 rows and 7 columns. 

# Access a specific element [r, c] (row r and column c) 
[in]  a[0, 5]
[out] 6

# Access a specific row 
[in]  print(a[0, :]) 
[out] [1 2 3 4 5 6 7]

# Access a specific column
[in]  print(a[:, 2])
[out] [3, 10]

# Change a specific element in the array. 
[in]  a[1, 5] = 20
[in]  print(a)
[out] [[1 2 3 4 5 6 7]
       [8 9 10 11 12 20 14]] 

# Change the whole column in the array 
[in]  a[:, 2] = [5, 6]
[in]  print(a)
[out] [[1 2 5 4 5 6 7]
       [8 9 6 11 12 20 14]]
# 3-dimensional array (i.e., Elements in the rows and columns 
are arrays.) 
[in]  b = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
[in]  print(b)
[out] [[[1 2]
        [3 4]]
      
       [[5 6]
        [7 8]]] 

# Get specific element in 3-dimensional arrays. 
[in]  print(b[0,1,1]) # First array; second row; second number
[out] 4

# Replace elements (similar to lower dimensional arrarys) 
[in]  b[1,0,1] = 10 
[in]  print(b)
[out] [[[1  2]
        [3  4]]
      
       [[5 10]
        [7  8]]]

# Replace multiple elements at once 
[in]  b[:,:,0] = [[12, 24], [48, 96]]
[in]  print(b)
[out] [[[12  2]
        [24  4]]
      
       [[48 10]
        [96  8]]]        

Initializing Different Types of Arrays

import numpy as np

# Zero matrix
[in]  zero = np.zeros((3, 5)) # Creating a 3 x 5 zero matrix
[in]  print(zero)
[out] [[0. 0. 0. 0. 0.]
       [0. 0. 0. 0. 0.]
       [0. 0. 0. 0. 0.]]
# Here, we can change the data type to int8/int16/int32/int64 if 
we'd like to get rid of the decimals. 

# All 1s matrix 
[in]  one = np.ones((3, 5), dtype='int16') # Creating a 3 x 5 matrix 
whose only element is 1 and setting the datatype to int16. 
[in]  print(one)
[out]  [[1 1 1 1 1]
        [1 1 1 1 1]
        [1 1 1 1 1]]

# All "other number" matrix 
[in]  other_num = np.full((3, 5), 36, dtype='int16')
[in]  print(other_num)
[out] [[36 36 36 36 36]
       [36 36 36 36 36]
       [36 36 36 36 36]]

# Use "full_like" to reuse the structure of an exisitng array. 
[in]  a = np.array([1,2,3,4,5,6,7], [8,9,10,11,12,13,14])
[in]  b = np.full_like(a, 5)
[in]  print(b)     
[out] [[5 5 5 5 5 5 5]
       [5 5 5 5 5 5 5]] 

# Random decimal numbers 
[in]  c = np.random.rand(4, 3)
[in]  print(c)
[out] [[0.2799959  0.0205946  0.0418946 ]
       [0.79033484 0.49661239 0.66497414]
       [0.77573664 0.29134228 0.95059255]
       [0.24277333 0.97923166 0.32381059]]

# Random decimal numbers using the same shape of an existing array 
[in]  d = np.random.random_sample(a.shape)  
[in]  print(d)
[out] [[0.02079737 0.67971327 0.17267642 0.22453829 0.4227724  0.53196238
  0.5366661 ]
       [0.10864681 0.35630595 0.42413239 0.50289581 0.02159023 0.27639738
  0.2568261 ]]
# Here, we are getting a 2 x 7 array with random decimal numbers. 

# Random integer values 
[in]  e = np.random.randint(-4, 10, size=(3, 5)) # lowest value = -4, 
highest value = 10 - 1 = 9
[in]  print(e)
[out] [[ 3 -2 -2  6  5]
       [ 0  4 -1  6  7]
       [ 3 -1  4 -3 -3]]       
 
# Create the identity matrix
[in]  I1 = np.identity(5)
[in]  print(I1) 
[out] [[1. 0. 0. 0. 0.]
       [0. 1. 0. 0. 0.]
       [0. 0. 1. 0. 0.]
       [0. 0. 0. 1. 0.]
       [0. 0. 0. 0. 1.]]

# Repeat an array
[in]  arr = np.array([1,2,3], [4,5,6])
[in]  r1 = np.repeat(arr, 3, axis=0) 
# 3 refers to the number of repetitions we'd like for the array; 
# axis=0 means it's the rows that we want repetition of. 
# If we want the repetition of columns, make axis=1. 
[in]  print(r1)
[out] [[1 2 3]
       [1 2 3]
       [1 2 3]
       [4 5 6]
       [4 5 6]
       [4 5 6]]  

# Be careful when copying the array
[in]  f = np.array([1,2,3])
[in]  print(f)
[in]  g = f
[in]  print(g)
[in]  g[0] = 100
[in]  print(g)  
[in]  print(f) 
[out] [1 2 3]
      [1 2 3]
      [100   2   3]
      [100   2   3] # If we simply let g = f, when we change the 
# element in g, the corresponding element in f will change as well. 

# In order to prevent this from happening, use a copy function instead. 
[in]  f = np.array([1,2,3])
[in]  print(f)
[in]  g = f.copy() # Use copy function. 
[in]  print(g)
[in]  g[0] = 100
[in]  print(g)  
[in]  print(f) 
[out] [1 2 3]
      [1 2 3]
      [1 2 3]
      [100   2   3]         

Mathematics

import numpy as np

# Basic arithmetic
[in]  a = np.array([1,2,3,4])
[in]  print(a)
[in]  print(a+2)
[in]  print(a-2)
[in]  print(a*2)
[in]  print(a/2)
[out] [1 2 3 4]
      [3 4 5 6]
      [-1 0 1 2]
      [2 4 6 8]
      [0.5 1. 1.5 2.]

[in]  b = np.array([5,6,7,8])
[in]  print(a + b)
[out] [6 8 10 12]    

[in]  print(a ** 2)
[out] [1 4 9 16]

# Basic trig functions: Sine & Consine
[in]  print(np.sin(a))
[in]  print(np.cos(a))
[out] [ 0.84147098  0.90929743  0.14112001 -0.7568025 ]
[out] [ 0.54030231 -0.41614684 -0.9899925  -0.65364362]

# We can verify this by using sin^2(a) + cos^2 (a) = 1
[in]  print((np.sin(a))**2 + (np.cos(a))**2)
[out] [1. 1. 1. 1.]

# Linear Algebra
# NumPy has a matrix multiplication function:  
[in]  c = np.ones((2, 3)) 
[in]  print(c)
[in]  d = np.full((3, 2), 2)
[in]  print(d)
[in]  print(np.matmul(c, d))
[out] [[1. 1. 1.]
       [1. 1. 1.]]
      [[2 2]
       [2 2]
       [2 2]]
      [[6. 6.]
       [6. 6.]] 

# We can also use NumPy to find the determinant of a matrix
[in]  e = np.identity(5)
[in]  print(e)
[in]  print(np.linalg.det(e)) # The determinant of identity matrices 
is always 1
[out] [[1. 0. 0. 0. 0.]
       [0. 1. 0. 0. 0.]
       [0. 0. 1. 0. 0.]
       [0. 0. 0. 1. 0.]
       [0. 0. 0. 0. 1.]]
      1.0

# Statistics
[in]  stats = np.array([1,2,3],[4,5,6])
[in]  print(stats)
[in]  print(np.min(stats))
[in]  print(np.max(stats))
[in]  print(np.min(stats, axis=1)) 
[in]  print(np.max(stats, axis=0)) 
# axis=0 is a parameter used in NumPy functions to specify that an operation 
should be performed along the rows of an array.
[out] [[1 2 3]
       [4 5 6]]
      1
      6
      [1 4]
      [4 5 6]

# Find the sum of the rows/columns
[in]  print(np.sum(stats, axis=0))
[out] [5 7 9] # 1+4; 2+5; 3+6              

Reorganizing Arrays

import numpy as np
# We can reorganize the existing arrays by doing the following: 
[in]  before = np.array([1,2,3,4], [5,6,7,8])
[in]  print(before)
[in]  print("The matrix named 'before' is a "+ str(before.shape) + " matrix.")
[in]  after = before.reshape((4,2)) 
[in]  print(after)
[in]  print("The matrix named 'after' is a "+ str(after.shape) + " matrix.")
# Since we had 8 elements in total, 
we have to make sure all 8 elements can be reassigned to some rows and 
columns.
[out] [[1 2 3 4]
       [5 6 7 8]]
      The matrix named 'before' is a (2, 4) matrix.
      [[1 2]
       [3 4]
       [5 6]
       [7 8]]
      The matrix named 'after' is a (4, 2) matrix.

# Vertically stacking vectors/matrices
[in]  v1 = np.array([1,2,3,4])
[in]  v2 = np.array([5,6,7,8])
[in]  v3 = np.vstack([v1, v1, v2, v2])
[in]  print(v3)
[in]  print(v3.shape)
[out] [[1 2 3 4]
       [5 6 7 8]
       [1 2 3 4]
       [5 6 7 8]]
      
      (4, 4)      
          
# Horizontally stacking vectors/matrices
[in]  h1 = np.ones((2, 4))
[in]  h2 = np.zeros((2, 2))
[in]  print(h1)
[in]  print(h2)
[in]  h3 = np.hstack([h1, h2])
[in]  print(h3)
[out] [[1. 1. 1. 1.]
       [1. 1. 1. 1.]]
      [[0. 0.]
       [0. 0.]]
      [[1. 1. 1. 1. 0. 0.]
       [1. 1. 1. 1. 0. 0.]]
                          

Miscellaneous

# We can use NumPy to load data stored in a text file. 
import numpy as np
[in]  filedata = np.genfromtxt('NumPy_testdata.txt', delimiter=',')
[in]  filedata.astype('int32')
[in]  filedata=filedata.astype('int32')
[in]  print(filedata)
[in]  print(filedata.shape)
[out] [[  1  13  21  11 196  75   4   3  34   6   7   8   0   1   2   3   4   5]
       [  3  42  12  33 766  75   4  55   6   4   3   4   5   6   7   0  11  12]
       [  1  22  33  11 999  11   2   1  78   0   1   2   9   8   7   1  76  88]]
      (3, 18)

# Boolean Masking and Advanced Indexing 
[in]  print(filedata > 25) # To check if every single element is greater than 25. 
[out] [[False False False False  True  True False False  True False False False
        False False False False False False]
       [False  True False  True  True  True False  True False False False False
        False False False False False False]
       [False False  True False  True False False False  True False False False
        False False False False  True  True]]

# Create an array that contains only the elements greater than 50 in filedata. 
[in]  print(filedata[filedata>50])
[out] [196  75 766  75  55 999  78  76  88]

# We can index with a list in NumPy
[in]  a = np.array([1,2,3,4,5,6,7,8,9])
[in]  print(a[[1, 2, 8]]) # Here, we're indexing the second, third, and ninth element in the array. 
[out] [2 3 9] 

# Check whether a column contains any number > 50
[in]  print(np.any(filedata>50, axis=0)) 
[in]  print(np.all(filedata>50, axis=0))
[in]  print(np.any(filedata>50, axis=1))
[in]  print(np.all(filedata>50, axis=1))   
[out] [False False False False  True  True False  True  True False False False
 False False False False  True  True]
      [False False False False  True False False False False False False False
       False False False False False False]
      [True  True  True]
      [False False False]                   

Exercise: Try to index the highlighted elements.

No alt text provided for this image
Credit: Keith Galli
# Create the 5 x 6 matrix as required. 
[in]  exer = np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],
[16,17,18,19,20],[21,22,23,24,25],[26,27,28,29,30]])
[in]  print(exer)
[in]  print(exer.shape)
[out] [[ 1  2  3  4  5]
       [ 6  7  8  9 10]
       [11 12 13 14 15]
       [16 17 18 19 20]
       [21 22 23 24 25]
       [26 27 28 29 30]]
      (6, 5)

# Solution: Blue 
[in]  print(exer[2:4, 0:2])
[out] [[11 12]
       [16 17]]

# Solution: Green
[in]  print(exer[[0,1,2,3],[1,2,3,4]])
[out] [ 2  8 14 20]

# Solution: Red
[in]  print(exer[[0,4,5], 3:5])
[out] [[ 4  5]
       [24 25]
       [29 30]]  
                          

要查看或添加评论,请登录

Ben W.的更多文章

  • International Parity Conditions Overview

    International Parity Conditions Overview

    What are international parity conditions? International parity conditions show how expected inflation differentials…

  • CSV files overview (And how to use Python to read/write simple csv files)

    CSV files overview (And how to use Python to read/write simple csv files)

    A CSV (Comma-Separated Values) file is a plain test file that stores tabular data, where each line represents a row…

  • Credit Valuation Adjustment (CVA) Overview

    Credit Valuation Adjustment (CVA) Overview

    Abstract: Credit Valuation Adjustment (CVA) is an essential concept in the world of finance, particularly in…

    1 条评论
  • K-Means Clustering Algorithm Overview

    K-Means Clustering Algorithm Overview

    K-means algorithm K-means algorithm is a clustering technique used to partition as set of data points into K clusters…

  • Clustering overview

    Clustering overview

    1. What is clustering? Clustering is a technique in machine learning and data mining that involves grouping a set of…

社区洞察

其他会员也浏览了