Common operations with NumPy
Here we’ll do some operations with arrays through the NumPy library and take the opportunity to compare NumPy with List Comprehensions and Lambda Functions and even see the difference in performance between the different kinds of operations.
Study notebook
Go to Jupyter Notebook to see the concepts that will be covered about Operations with Arrays in NumPy. Note: Important functions, outputs, and terms are bold to facilitate understanding — at least mine.
View versions
Let’s start by viewing the Python version and the NumPy package version. This is for almost any package — always check the version we work with.
import sys import numpy as np print(sys.version) np.__version__ 3.8.1 # Python version '1.19.5' # NumPy version
Create an array in NumPy
Now we create an Array with the arange function of Numpy and not range built-in. The detail is subtle.
array1 = np.arange(15); array1 array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
Help?
When we have any questions about any object in Python, can we call help using ? And the name of the object.
?array1
A complete help of that object is opened — the full description of what the object is specifically, how to use it, parameters that we can use, attributes and methods, other packages capable of generating that same type of object, etc.
Mathematical Methods directly in arrays
Once we have the collection created with NumPy, we can call the mathematical methods or any methods with + TAB.
Average
array1.mean() 7.0
Sum
array1.sum() 105
Minimum value
array1.min() 0
Maximum value
array1.max() 14
Standard-deviation
array1.std() 4.320493798938574
List Comprehension in NumPy Arrays
Let’s use the NumPy array1 in the list comprehension to use any other type of array.
Translating: for (for) each value of x within (in) array1, multiply the value of x by itself:
[x * x for x in array1] [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196]
Translating: for (for) each value of x within (in) array1, if (if) the value of x divisible by two results in zero, return the value of x multiplied by itself — that is, the operation only with even numbers.
[x * x for x in array1 if x % 2 == 0] [0, 4, 16, 36, 64, 100, 144, 196]
Therefore, when working with NumPy, remember that you can also work with List Comprehension.
Lambda Function in NumPy Arrays
Here we call Lambda, where it will return the value of x only when the division of x by 2 is == 0, for each element of array1 NumPy — then apply the filter to print the numerical results in the list:
list(filter(lambda x: x % 2 == 0, array1)) [0, 2, 4, 6, 8, 10, 12, 14]
This that we did above can be further simplified:
array1 % 2 == 0 array([ True, False, True, False, True, False, True, False, True, False, True, False, True, False, True])
We’re using Array1 (NumPy), asking for the rest of the division for 2 == 0 — we return exactly True| False; that is, with this simple notation, we were able to do the same with the map function, generating True| False for each value of array1 that met that condition.
? Slicing notation — Powerful! ?
What if we put the above condition as a slicing notation within array1?
array1[array1 % 2 == 0] array([0, 2, 4, 6, 8, 10, 12, 14])
NumPy opens up a sea of possibilities. Everything that has been done so far has been able to replace with this simple notation of slicing. This will work when we’re working with NumPy — very important for Data Science when working with Python.
Evaluating Performance
We will use the %timeit operator. It belongs, specifically, to the Jupyter Notebook. It allows you to measure the execution time of a command.
Translating: For (for) each value of x in array1, return if (if) x divisible by 2 results in 0, that is, x for even. And, of course, calculate the time of the operation.
%timeit [x for x in array1 if x % 2 == 0] 6.03 μs ± 135 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
It took 6.03 microseconds with the list comprehension in Pure Python, and already with the notation of slicing, the time gets to be 3x faster with Numpy Slicing.
%timeit array1[array1 % 2 == 0] 2.16 μs ± 65.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Logical Operators
We can also use logical operators for operations with Numpy Arrays.
Ex.1
Array1 is greater than 8? A check is made for each element of the array:
array1 > 8 array([False, False, False, False, False, False, False, False, False, True, True, True, True, True, True])
Ex.2
This notation can be placed within the index, returns the values that meet the condition — not True| False:
array1[array1 > 8] array([9, 10, 11, 12, 13, 14])
Ex.3
We can use a logical operator and. We concatenate two logical operations that must be true:
(array1 > 9) & (array1 < 12) array([False, False, False, False, False, False, False, False, False, False, True, True, False, False, False])
Ex.4
We can use a logical operator or. We concatenate two logical operations that at least one must be true:
(array1 > 13) | (array1 < 12) array([ True, True, True, True, True, True, True, True, True, True, True, True, False, False, True])
Ex.5
We can also place the logical operator or within the Slicing Notation:
array1[(array1 > 13) | (array1 < 12)] array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14])
Ex.6
We can create a NumPy Array with List Comprehension:
array2 = np.array([x ** 3 for x in range(15)]; array2 array([0, 1, 8, 27, 64, 125, 216, 343, 512, 729, 1000, 1331, 1728, 2197, 2744])
Ex.7
Array NumPy with List Comprehension, returning true for each value of x in the range:
array2 = np.array([True for x in range(15)]); array2 array([ True, True, True, True, True, True, True, True, True, True, True, True, True, True, True])
Ex.8
Use this same array2 above as the index for array1.
array1[array2] array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
Let’s continue from here with another Jupyter Notebook to see the concepts that will be covered about concatenation, join, and split arrays in NumPy.
We’re talking about the split-apply-combine technique. We can split an array, apply a method, an analysis, or some calculation to parts of this array, and then combine the results — this is very useful in various data manipulation and organization situations.
Concatenating Arrays
First, we will create array1 with the Ones function, which creates an array filled with values 1:
array1 = np.ones(4)
We now create array2, with another function, arange — elements up to 15 positions:
array2 = np.arange(15)
Next, we call the concatenate function, which belongs to NumPy, to concatenate between array1 and array2:
array_conc = np.concatenate((array1, array2)); array_conc array([1., 1., 1., 1., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14.])
Above, the first part of the array output can quickly identify the set of 1 generated by array1 in np.ones and the other part of array2, from 0 to 14.
Joining Arrays
Let’s create two arrays, a and b:
a = np.ones((3,3)) b = np.zeros((3,3)) print(a) [[1. 1. 1.] [1. 1. 1.] [1. 1. 1.]] print(b) [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
vstack & hstack
Let’s apply the vstack and hstack functions. If we don’t know anything about this function or anything else, we call help!
# stack arrays in vertical sequence ?np.vstack # stack of arrays in horizontal sequence ?np.hstack
vstack function
Fill snares towards the lines:
np.vstack((a,b)) array([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
hstack function
Fills in the direction of the columns:
np.hstack((a,b)) array([[1., 1., 1., 0., 0., 0.], [1., 1., 1., 0., 0., 0.], [1., 1., 1., 0., 0., 0.]])
Creating more arrays
We’ll create more arrays to run other stack examples:
a = np.array([0, 1, 2]) b = np.array([3, 4, 5]) c = np.array([6, 7, 8])
column_stack function
This function stacks — stacks one-dimensional arrays in the direction of columns — vertically forming a two-dimensional array:
np.column_stack((a, b, c)) array([[0, 3, 6], [1, 4, 7], [2, 5, 8]])
split Arrays
Below we use the arange function to create an array of 16 elements and apply the reshape to make the one-dimensional array two-dimensional:
array3 = np.arange(16).reshape((4,4)); array3 array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]])
hsplit function
This function splits horizontally, from array3, passing parameter 2 — we divide array3 into two parts [p1 and p2].
[array3_p1, array3_p2] = np.hsplit(array3, 2) array3_p1 array([[ 0, 1], [ 4, 5], [ 8, 9], [12, 13]]) array3_p2 array([[ 2, 3], [ 6, 7], [10, 11], [14, 15]])
vsplit function
This function does a vertical-level split from array3, passing parameter 2 — we divide array3 into two parts [p1 and p2].
[array3_p1, array3_p2] = np.vsplit(array3, 2) array3_p1 array([[0, 1, 2, 3], [4, 5, 6, 7]]) array3_p2 array([[8, 9, 10, 11], [12, 13, 14, 15]])
Record and Load data with NumPy
We create the data object from array3:
data = array3
save
use the save function — save, it is helpful for at the end of the process, to keep the array in the operating system. You don’t have to redo the process next time:
np.save('data_saved_v1', data)
load
use the load function — load:
loaded_data = np.load('data_saved_v1.npy'); loaded_data array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]])
This material is instrumental and comfortably serves whenever it is necessary to manipulate the NumPy library.
And there we have it. I hope you have found this helpful. Thank you for reading. ??