Introduction to NumPy

Introduction to NumPy

NumPy is a popular Python library used for numerical operations, particularly in the domains of data science and machine learning. It provides a multidimensional array object, various functions for array manipulation, linear algebra operations, statistical computations, and much more. In this article, we will explore the key features and functionalities of NumPy, along with detailed examples of how it is used in machine learning and data science.

We will be using?Google Colaboratory?Python notebooks to avoid setup and environment delays. The focus of this article is to get you up and running in Machine Learning with Python, and we can do all that we need there. Here is the link below:


Installing NumPy

Before diving into the examples, it is important to have NumPy installed in your Python environment. You can install it using pip, the package installer for Python, by running the following command:

pip install numpy
        

Once installed, you can import NumPy into your Python scripts or notebooks using the following command:

import numpy as np
        

NumPy Arrays

The fundamental data structure of NumPy is the ndarray, short for N-dimensional array. It represents a grid of values, all of the same type, and is indexed by a tuple of non-negative integers. With NumPy arrays, you can perform efficient mathematical operations on entire data arrays, rather than iterating through individual elements.

Creating NumPy Arrays

  1. Transforming Standard list

You can create NumPy arrays from Python lists or tuples using the?array()?function.

  • create a normal Python list and check its type:

my_list = [1, 2, 3, 4]
print(type(my_list)) # Output: <class 'list'>        

Output will be


<class 'list'>        

Now, let's convert it to Numpy and check the type as follows:

import numpy as np
arr = np.array(my_list)
print(type(arr))        

Output:

<class 'numpy.ndarray'>

# Note that the type is 'numpy.ndarray'        
No alt text provided for this image


Here is a full example:

import numpy as np

# Create a 1-dimensional array
arr1d = np.array([1, 2, 3, 4, 5])
print(arr1d)
# Output: [1 2 3 4 5]


        

Date Type (dtype)

By default, it has data type of int64

import numpy as np


arr1 = np.array ([1, 2, 3, 4])


arr1.dtype        


dtype('int64')        

You can change the type using dtype =

import numpy as np


arr1 = np.array ([1, 2, 3, 4], dtype = 'int8')
arr1.dtype        


dtype('int8')        


Create a 2-dimensional array.

Let's do the same for a 2-dimensional array that can show the real difference when printing them:

  • Create a 2-dimensional list

list2d = [[1, 2, 3], [4, 5, 6], [4, 5, 6]]
print(type(list2d))
print(list2d)
        

Output:

<class 'list'>
[[1, 2, 3], [4, 5, 6], [4, 5, 6]]
        

Now convert it to np.array


import numpy as n
arr = np.array(list2d)
print(type(arr))
print(arr)p
        

Output

<class 'numpy.ndarray'>
[[1 2 3]
 [4 5 6]
 [4 5 6]]        

Note the difference in the format. Now, it is printing as 2d array (matrix) with 3 columns and 3 rows. This is a feature of NumPy over Python's list

No alt text provided for this image
 Create a 2-dimensional array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2d)
# Output: 
# [[1 2 3]
#  [4 5 6]]
        

NumPy provides several other functions to create arrays with specific initial values, such as?zeros(),?ones(), and?arange(). You can also create arrays from existing data using functions like?copy()?or by reading data from files.


2. Built-in Functions

NumPy provides several built-in functions that you can use to create arrays with specific initial values. Here are some commonly used functions:

zeros(shape, dtype=None):

Creates an array filled with zeros. Here is the shape is a Tuple of the number of Rows and the number of Columns (Rows, Columns) in the case of a 2d array as follows.:

import numpy as np

arr = np.zeros((2, 3)) 
print(arr)        


Output:

[[0. 0. 0.]

[0. 0. 0.]]        


No alt text provided for this image



2. `ones(shape, dtype=None)`:

Creates an array filled with ones. Similar to Zeros but Ones instead of zeros




import numpy as np

arr = np.ones((3, 2))
print(arr)
# Output:
# [[1. 1.]
#  [1. 1.]
#  [1. 1.]]
        
No alt text provided for this image


3. full(shape, fill_value, dtype=None):

Creates an array filled with a specified value.

import numpy as np

arr = np.full((2, 2), 5) print(arr)        


Output:

[[5 5]

[5 5]]

4. `eye(N, M=None, k=0, dtype=None)`:

Creates a 2-D array with ones on the diagonal and zeros elsewhere (identity matrix).




import numpy as np

arr = np.eye(3)
print(arr)
# Output:
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]
        

Example:

Using NumPy, create a 6x6 identity matrix and add a scalar value of 10 to it. Multiply the resulting matrix with a 6x1 column vector filled with random integers between 1 and 20. Print the resulting vector.

Solution:

import numpy as np

# Create a 6x6 identity matrix
identity_matrix = np.eye(6)

print("Original Identity Matrix:")
print(identity_matrix)

# Add a scalar value of 10 to the identity matrix
result_matrix = identity_matrix + 10

print("Matrix after adding 10:")
print(result_matrix)

# Create a 6x1 column vector with random integers between 1 and 20
random_vector = np.random.randint(1, 21, (6, 1))

print("Random 6x1 Column Vector:")
print(random_vector)

# Multiply the result matrix with the random vector
result_vector = np.dot(result_matrix, random_vector)

print("Resulting Vector:")
print(result_vector)
        


5. random:

NumPy.random is a module in the NumPy library that provides functions for generating random numbers. Here are some of the most commonly used methods in NumPy.random:

np.random.rand(shape):

This function generates random numbers from a uniform distribution between 0 and 1. It takes the shape of the output as input (shape is Rows, Columns), and returns an array of random numbers of that shape.

import numpy as np
rand_array = np.random.rand(2, 3)
print(rand_array)
        

Output:

array([[0.46165452, 0.79167545, 0.88772773],
      [0.86935969, 0.40772606, 0.47122735]])
        

np.random.randn:

  1. This function generates random numbers from a standard normal distribution (mean 0 and variance 1). It takes the shape of the output as input, and returns an array of random numbers of that shape.

randn_array = np.random.randn(2, 3)
print(randn_array)
        

Output:

array([[ 0.58312772, -0.35265183, -0.24225232],
      [ 0.97225141, -0.13613563,  0.54331297]])
        

np.random.randint:

This function generates random integers between a specified low (inclusive) and high (exclusive) value. It takes the low, high, and size of the output as input, and returns an array of random integers.

  1. Here is the syntax for the function:

np.random.randint(low, high=None, size=None, dtype=int)
        

Let's break down the parameters of this function:

  • low: The lowest (inclusive) integer value to be generated.
  • high: The highest (exclusive) integer value to be generated. If this parameter is not specified, then the generated integers will be between 0 and low.
  • size: The shape of the output array. If this parameter is not specified, a single integer value will be generated. If it is an integer, a 1-dimensional array of the specified size will be generated. If it is a tuple, a multi-dimensional array with the specified shape will be generated.
  • dtype: The desired data type of the output array. The default is int.

Now, let's see a few examples of using np.random.randint:

Example 1:

import numpy as np
arr = np.random.randint(1, 10, 5)
print(arr)
        

Output:

[6 3 7 5 8]
        

In this example, np.random.randint(1, 10, 5) generates a 1-dimensional array of length 5, where each element is a random integer between 1 and 10 (exclusive).

Example 2:

import numpy as np
arr = np.random.randint(1, 10, (2, 3))
print(arr)
        

Output:

[[7 9 5]
[3 4 1]]
        

In this example, np.random.randint(1, 10, (2, 3)) generates a 2-dimensional array of shape (2, 3), where each element is a random integer between 1 and 10 (exclusive).

import numpy as np

arr = np.random.randint(1, 101, 10)

print(arr)        

Output:

[84 28 69 97 14 32 31 88 75 94]
        

  • np.random.randint(1, 101, 10): This generates a 1-dimensional array (or list) of 10 random integers between 1 and 100.

np.random.choice:

This function generates random samples from a given 1-D array. It takes the array and the size of the output as input, and returns an array of random samples.

array = np.array([1, 2, 3, 4, 5])
choice_array = np.random.choice(array, size=(2, 3))
print(choice_array)
        

Output:

array([[2, 1, 4],
      [1, 4, 5]])
        

These are just a few examples of commonly used methods in NumPy.random. There are many more functions available, such as np.random.shuffle, np.random.permutation, np.random.uniform, etc., which can be useful in various statistical simulations and data analysis tasks.


6. `arange(start, stop=None, step=1, dtype=None)`:

Creates an array with evenly spaced values in a given range. This is similar to the Python's range function. It starts from (inclusive) number and ends before the second number.

No alt text provided for this image

You can also add the step as the third parameter to jump a step (default is 1)

No alt text provided for this image




import numpy as np

arr = np.arange(1, 6)
print(arr)
# Output: [1 2 3 4 5]

arr2 = np.arange(0, 1, 0.2)
print(arr2)
# Output: [0.  0.2 0.4 0.6 0.8]
        

These are just a few of the many built-in functions provided by NumPy to create arrays. By using these functions, you can quickly create arrays of specific shapes and initialize them with desired values, saving time and effort in array creation and initialization.

7. np.linspace

np.linspace?is a NumPy function that is used to create an array of evenly spaced numbers over a specified range.

The syntax of?np.linspace?is as follows:

np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
        

Here is a breakdown of the parameters:

  • start: The starting value of the sequence.
  • stop: The end value of the sequence.
  • num: The number of evenly spaced values to generate between?start?and?stop. By default, it is set to 50.
  • endpoint: If set to?True?(default), the?stop?value is included in the array. If set to?False, the?stop?value is not included.
  • retstep: If set to?True, the function returns the spacing between the numbers as the second output.
  • dtype: The data type of the output array. If not specified, it is determined based on other input parameters.

Here are a few examples to illustrate the usage of?np.linspace:

Example 1:

import numpy as np

arr = np.linspace(0, 1, num=5)
print(arr)
# Output: [0.   0.25 0.5  0.75 1.  ]
        

In this example,?np.linspace?generates an array of 5 evenly spaced numbers between 0 and 1 (inclusive).

Example 2:

import numpy as np

arr, step = np.linspace(0, 1, num=5, retstep=True)
print(arr)
# Output: [0.   0.25 0.5  0.75 1.  ]
print(step)
# Output: 0.25
        

In this example,?retstep=True?is used to return the spacing between the numbers as the second output. The variable?step?contains the spacing value.

Example 3:

import numpy as np

arr = np.linspace(0, 10, num=11, endpoint=False)
print(arr)
# Output: [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
        

In this example,?endpoint=False?is used to exclude the endpoint (10) from the array. The resulting array contains 11 evenly spaced numbers from 0 to 9.

np.linspace?is commonly used to create arrays of specified length with evenly spaced values, which can be useful in various numerical computations and plotting tasks.


np.diagonal

In NumPy, the np.diagonal function is used to extract the diagonal elements from a given matrix. The syntax for the function is as follows:

numpy.diagonal(a, offset=0, axis1=0, axis2=1)
        

Let's break down the parameters of this function:

  • a: The input matrix from which you want to extract the diagonal elements.
  • offset: (Optional) The offset of the diagonal from the main diagonal. By default, it is 0, indicating the main diagonal. A positive value gives an upper diagonal, and a negative value gives a lower diagonal.
  • axis1 and axis2: (Optional) The axes along which to compute the diagonals. By default, axis1=0 and axis2=1 indicate the first and second axes, respectively.

Now, let's look at an example to understand how np.diagonal works:

Example:

import numpy as np

matrix = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

result = np.diagonal(matrix)

print(result)
        

Output:

[1 5 9]
        

In this example, we have an input matrix called matrix. By using np.diagonal(matrix), we extract the diagonal elements of the matrix, which are [1, 5, 9]. The resulting diagonal elements are returned as a 1-dimensional array.

It's worth mentioning that if you want to extract a diagonal that is not the main diagonal, you can use the offset parameter. For example, np.diagonal(matrix, offset=1) will give you the upper diagonal elements, which in this case are [2, 6].

import numpy as np

matrix = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

result = np.diagonal(matrix, offset=1)

print(result)
        

Output:

[2 6]
        

In this example, we use the offset parameter with a value of 1. This extracts the elements one position above the main diagonal. So, the resulting diagonal elements are [2, 6].


axis1 and axis2 specify the two-dimensional plane where the diagonal is taken from. In np.diagonal(a, offset=0, axis1=0, axis2=1), the default values are axis1=0 and axis2=1. This means that the function will look for the diagonal in the plane defined by the first and second axes.

Here is a simple usage example in a 2D scenario:

import numpy as np

matrix = np.array([[1, 2, 3], 
                  [4, 5, 6], 
                  [7, 8, 9]])

# Get the main diagonal
result = np.diagonal(matrix)
print(result)  # Output: [1 5 9]
        

In a 2D array, axis1=0 and axis2=1 will find the diagonal from top-left to bottom-right, which is the standard behavior. The offset can move the starting point for the extraction of the diagonal.

However, the axis1 and axis2 parameters become very useful when dealing with arrays of higher dimensions (3D and beyond). They specify the two axes that define the 2D plane in which the diagonal is considered.

Let me illustrate with a 3D array example:

import numpy as np

# 3D array (2x2x2)
a = np.arange(8).reshape(2, 2, 2)
print(a)

# Specify axis1 and axis2 to obtain certain diagonals
d = a.diagonal(0, axis1=0, axis2=1)
print(d)
        

Output from print(a):

[[[0 1]
 [2 3]]

[[4 5]
 [6 7]]]
        

Output from print(d):

[[0 6]
[1 7]]
        

What happened here? The 3D array 'a' basically consists of two 2x2 matrices stacked on top of each other. When we call diagonal(0, axis1=0, axis2=1), we are specifying that we want the diagonal from each of these 2x2 matrices "across" the first dimension (i.e., axis1=0), with axis2=1 representing the second dimension in each 2D sub-array. Hence, in [[0 6] [1 7]], 0 and 6 are the diagonals from the first 2x2 matrix, and 1 and 7 are from the second.

The axes referred to by axis1 and axis2 are removed from the original array (a) and a new axis is appended to the end, which corresponds to the diagonals. In this example, the array a has shape (2, 2, 2), and after taking the diagonal, the result has shape (2, 2) because the first two dimensions are "removed" and a new one is "added" for the diagonals.

Let's take a simpler 3D array and I will explain how the axes work in np.diagonal function:

import numpy as np

# A 3D array (2x3x3), looks like two 3x3 matrices stacked on top of each other
a = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], 
             [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])

print(a)
        

Our 3D array 'a' looks like this:

[[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]]

[[10 11 12]
 [13 14 15]
 [16 17 18]]]
        

Now, let's use np.diagonal function with different axis options:

d = a.diagonal(0, axis1=0, axis2=1)  
print(d)
        

Output will be:

[[ 1 11]
[ 4 14]
[ 7 17]]
        

Here, axis1=0 and axis2=1 means the diagonals are taken along the "depth" (1st dimension or axis=0) and "rows" (2nd dimension or axis=1) of the 3D array. Resulting array has shape (3, 2) and you can see that it has picked up the diagonal elements from both 3x3 matrices in the original 3D array a.

But if we set axis1=0 and axis2=2, it means the diagonals are taken along the "depth" (1st dimension or axis=0) and "columns" (3rd dimension or axis=2) of the 3D array. Look how the output changes:

d = a.diagonal(0, axis1=0, axis2=2)
print(d)
        

Output:

[[ 1 14]
[ 2 15]
[ 3 16]]
        

You can see it has extracted diagonal elements from different 2D planes of the original 3D array a.


Transpose an array

import numpy as np

A = np.array([[1, 2], [3, 4]])

B = A.T        

In this code, import numpy as np imports the NumPy library, which is commonly used for numerical computations in Python.

A = np.array([[1, 2], [3, 4]]) creates a NumPy array called A with shape (2, 2). The array contains the elements 1, 2, 3, and 4 arranged in a 2x2 matrix.

The next line, B = A.T, uses the .T attribute to get the transpose of array A. The transpose operation swaps the rows and columns of a matrix, effectively flipping it over its diagonal.

So, in this case, the transpose of A is assigned to the variable B. The resulting array B will have shape (2, 2), and its elements will be arranged such that the rows of B are the columns of A, and the columns of B are the rows of A.

Here's the final array B, after taking the transpose:

[[1, 3],
[2, 4        


========================

Practice Examples:

Let's take a break from theory and start practicing to make sure everything is clear:

Question:

Create a numpy array of 52 evenly linearly spaced points between 1 and 5.

Solution:

Try to solve yourself and compare the solution to the following solution:


=======

Understanding Axis in NumPy Arrays

In the context of arrays, an axis refers to a specific dimension along which operations can be performed. Arrays can have one or more dimensions, and each dimension is associated with an axis. Let's break down the concept of axes with a few examples:

1-D Array:

  1. An array with a single row or single column has one axis. In a 1D array, there's only one way to traverse elements, so there's only one axis.

import numpy as np 
arr_1d = np.array([1, 2, 3, 4])        


2-D Array:

A 2D array has two axes: rows (axis 0) and columns (axis 1).You can think of a 2D array as a matrix, where each row is a list of values, and columns are formed by taking elements from each row at the same index.

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])        




3-D Array:

A 3D array has three axes: axis 0, axis 1, and axis 2. Think of a 3D array as a collection of 2D arrays stacked on top of each other.

import numpy as np

arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

print(arr_3d)

'''
Output:
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
'''        


When performing operations on arrays, you can specify along which axis you want the operation to occur. For example, when calculating the sum along an axis, the operation will sum the values along that axis, effectively collapsing that axis. Similarly, when finding the minimum or maximum value along an axis, the operation will be performed along that axis.

Here's an example to illustrate using the axis parameter with numpy.sum():

pythonCopy code        

import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6]]) sum_along_rows = np.sum(arr, axis=0) # Sum along axis 0 (columns) sum_along_columns = np.sum(arr, axis=1) # Sum along axis 1 (rows) print("Sum along rows:", sum_along_rows) # Output: [5 7 9] print("Sum along columns:", sum_along_columns) # Output: [ 6 15]

In the context of numpy.argmin() and numpy.argmax(), specifying an axis allows you to find the indices of the minimum or maximum values along that axis, rather than across the entire array.

======

Concatenate two NumPy arrays

The function used to concatenate two NumPy arrays horizontally is numpy.concatenate() with the axis parameter set to 1. Here's an example:

import numpy as np

# Creating two NumPy arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Concatenating the arrays horizontally
result = np.concatenate((arr1, arr2), axis=0)

print(result)
        

The output will be:

[[1 2 3 4 5 6]]
        

Note that the axis parameter determines the axis along which the arrays will be concatenated. In this case, setting axis=0 means the arrays will be concatenated horizontally (side by side).

You can achieve the same result using the numpy.hstack() function. Here's an example:

import numpy as np

# Creating two NumPy arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Concatenating the arrays horizontally using hstack
result = np.hstack((arr1, arr2))

print(result)
        

The output will be identical to the previous example:

[1 2 3 4 5 6]
        

The numpy.hstack() function horizontally stacks the arrays, combining them into a single array.

It is important to note the other two functions for merging arrays, .vstack and .append. The functions .vstack() and .append() in NumPy have different functionalities:

  1. numpy.vstack():

  • It is used to vertically stack/concatenate arrays.
  • It takes a sequence of arrays and stacks them vertically to create a new array.
  • The arrays must have the same number of columns (along the horizontal axis).
  • It returns a new array with the combined rows.
  • Here's an example:

import numpy as np

# Creating two NumPy arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Vertically stacking the arrays using vstack
result = np.vstack((arr1, arr2))

print(result)
        

Output:

[[1 2 3]
 [4 5 6]]
        

  1. numpy.append():

  • It is used to append values to an existing array.
  • It takes an array and a value or an array as arguments.
  • If a value is provided, it appends the value to the end of the flattened array.
  • If an array is provided, it appends the array as a new row to the original array.
  • It returns a new array with the appended values.
  • Here's an example:

import numpy as np

# Creating a NumPy array
arr = np.array([1, 2, 3])

# Appending a value to the array using append
result1 = np.append(arr, 4)

# Appending another array as a row to the original array
result2 = np.append(arr, [[4, 5, 6]], axis=0)

print(result1)
print(result2)
        

Output:

[1 2 3 4]
[[1 2 3]
 [4 5 6]]
        

In summary, .vstack() is used to vertically stack multiple arrays, while .append() is used to add values or arrays either to the end of an existing array or as a new row in the array.

Element-wise multiplication

Element-wise multiplication, also known as Hadamard product, is a mathematical operation that takes two arrays/matrices of the same dimensions and produces another array/matrix of the same dimension as the operands where each element i, j is the product of elements i, j of the original two arrays/matrices. It's a way of multiplying corresponding entries of arrays/matrices together.

To perform element-wise multiplication with NumPy in Python, you use the '*' operator between two NumPy arrays. Here's an example:

import numpy as np

# Create two numpy arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Perform element-wise multiplication
product_array = array1 * array2

print(product_array)
        

In this example, the result will be:

[ 4 10 18]
        

This is because 14 = 4, 25 = 10, and 3*6 = 18. Note that the arrays have to be of the same shape for this operation.

Also, NumPy provides the multiply() function which also performs element-wise multiplication.

Here's an example:

import numpy as np

# Create two numpy arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Perform element-wise multiplication
product_array = np.multiply(array1, array2)

print(product_array)
        

This will produce the same output as the previous example:

[ 4 10 18]
        

Just like the '*' operator, the np.multiply() function requires the arrays to be the same shape.


========

Array Shape and Dimensions

You can determine the shape and dimensions of a NumPy array using the?shape?attribute. The shape denotes the size of each dimension, and the number of dimensions is called the array's rank. For example:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.shape)
# Output: (2, 3)

print(arr.ndim)
# Output: 2
        

You can also get the size, which is the number of elements

print(arr.size
# Output: 6        

Here is another example of a 3-dimensional array with (2, 2, 3) shape:

import numpy as np


arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])


print(arr.shape)
# Output: (2, 2, 3)


print(arr.ndim)
# Output: 3


print(arr.size)
# Output: 12


print(arr)
'''
[[[ 1  2  3]
  [ 4  5  6]]


 [[ 7  8  9]
  [10 11 12]]]
'''        


reshape

In NumPy, the reshape function is used to change the shape (dimensions) of an array without changing its data. The syntax for the reshape function is as follows:

numpy.reshape(a, newshape, order='C')
        

Let's break down the parameters of this function:

  • a: The input array that you want to reshape.
  • newshape: The new shape that you want to give to the array. It can be a single integer or a tuple of integers specifying the dimensions of the new shape.
  • order: (Optional) The order in which the elements should be arranged in the reshaped array. It can be 'C' for C-style (row-major) order or 'F' for Fortran-style (column-major) order. The default is 'C'.

Here are a few examples to demonstrate how reshape works:

Example 1:

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
new_arr = np.reshape(arr, (2, 3))
print(new_arr)
        

Output:

[[1 2 3]
[4 5 6]]
        

In this example, we have the input array arr with 6 elements. By using np.reshape(arr, (2, 3)), we reshape the array into a 2-dimensional array with 2 rows and 3 columns.

Example 2:

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
new_arr = np.reshape(arr, (3, -1))
print(new_arr)
        

Output:

[[1 2]
[3 4]
[5 6]]
        

In this example, we use -1 as one of the dimensions in newshape. This indicates to NumPy that it should calculate the appropriate size based on the other dimension and the number of elements in the array. So, in this case, newshape is (3, -1), and NumPy automatically determines that the resulting array should have 3 rows and 2 columns.

Note that the total number of elements in the reshaped array must match the total number of elements in the original array. Otherwise, a ValueError will be raised.

Another way to use reshape as a method in the array as follows:

arr.reshape(shape)

Example:

import numpy as np

arr = np.random.randint(1, 101, 100).reshape(10, -1)

print(arr.shape)

# Output: (10, 10)        

  • np.random.randint(1, 101, 100): This generates a 1-dimensional array (or list) of 100 random integers between 1 and 100.
  • reshape(10, -1): This reshapes the 1-dimensional array into a 2-dimensional array (a matrix) with 10 rows and an automatically determined number of columns. The -1 in the reshape function tells numpy to automatically determine the number of columns based on the given number of rows and the length of the array. Since the previous code generates numbers from 0 - 100 and we are creating 10 rows, then it will create 10 columns resulting in a shape of (10, 10)

So, overall, this line of code generates a 2-dimensional array of shape (10, n), where n is the number of columns determined by the length of the 1-dimensional array (in this case, 100 elements by 10 rows. n = 10) . Each element in the array is a random integer between 1 and 100.

Indexing and Slicing

NumPy arrays can be indexed and sliced similar to Python lists, allowing you to access and manipulate specific elements or subarrays.

Indexing and slicing are useful techniques in NumPy for accessing and manipulating specific elements or subarrays within an array. For full example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr[0])
# Output: 1

print(arr[1:4])
# Output: [2 3 4]

arr[0] = 10
print(arr)
# Output: [10 2 3 4 5]

arr2d = np.array([[1, 2, 3], [4, 5, 6]])

print(arr2d[1, 2])
# Output: 6

print(arr2d[:, 1])
# Output: [2 5]
        

Explanation:

  • arr[0]?returns the first element of the array?arr, which is?1.
  • arr[1:4]?returns a subarray consisting of elements at index 1, 2, and 3, which are?[2, 3, 4].
  • arr[0] = 10?changes the value of the first element of?arr?to?10. The resulting array is?[10, 2, 3, 4, 5].
  • arr2d[1, 2]?returns the element at the second row and third column of?arr2d, which is?6.
  • arr2d[:, 1]?returns the second column of?arr2d, which is?[2, 5]. The colon?:?represents all elements along that axis.


  • Indexing refers to accessing individual elements of an array by specifying their position using square brackets []. In the given example, arr[0] returns the first element of the array arr, which is '1'. Indexing in NumPy starts with 0.
  • Slicing refers to accessing a range of elements from an array by specifying a start and end index, separated by a colon (:). In the given example, arr[1:4] returns a subarray consisting of elements at index 1, 2, and 3, which are [2, 3, 4]. The end index is exclusive, meaning the slice includes elements up to, but not including, the end index.

Slicing can also be used to modify elements or subarrays. In the given example, arr[0] = 10 changes the value of the first element of arr to 10. The resulting array is [10, 2, 3, 4, 5].

For multi-dimensional arrays, indexing and slicing can be applied along each axis. In the given example, arr2d[1, 2] returns the element at the second row and third column, which is 6. The indices are separated by a comma.

To access specific columns or rows in a multi-dimensional array, the colon (:) can be used. For example, arr2d[:, 1] returns the second column of arr2d, which is [2, 5]. The colon represents all elements along that axis.

Here are detailed examples and outputs for slicing 2-dimensional NumPy arrays:

import numpy as np

arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(arr2d)
# Output:
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

# Slicing rows
print(arr2d[1])
# Output: [4 5 6]

print(arr2d[0:2])
# Output:
# [[1 2 3]
#  [4 5 6]]

# Slicing columns
print(arr2d[:, 1])
# Output: [2 5 8]

print(arr2d[:, 0:2])
# Output:
# [[1 2]
#  [4 5]
#  [7 8]]

# Slicing a subarray
print(arr2d[1:3, 1:3])
# Output:
# [[5 6]
#  [8 9]]
        

Explanation:

  • arr2d?is a 2-dimensional NumPy array of shape (3, 3) with elements from 1 to 9.

[[1 2 3]
 [4 5 6]
 [7 8 9]]        
No alt text provided for this image


  • arr2d[1]?returns the second row of?arr2d, which is?[4, 5, 6].

No alt text provided for this image


  • arr2d[0:2]?returns the first and second rows of?arr2d, which are?[[1, 2, 3], [4, 5, 6]]. [0:2] = from element 0 (inclusive) to 2 (exclusive: not included)



As you can see from the previous two examples, if you have only one parameter, it will be the row. In the following examples, we will start adding columns.

  • arr2d[:, 1]?returns the second column of?arr2d, which is?[2, 5, 8]. The colon?:?represents all rows along that axis.
  • arr2d[:, 0:2]?returns the first and second columns of?arr2d, which are?[[1, 2], [4, 5], [7, 8]].
  • arr2d[1:3, 1:3]?returns a subarray consisting of rows 1 and 2, and columns 1 and 2, which is?[[5, 6], [8, 9]].

These examples demonstrate how slicing can be used to extract specific rows, columns, or subarrays from a 2-dimensional NumPy array. The resulting slices maintain the original array's shape or can be reshaped and manipulated further.



In summary, indexing allows accessing individual elements of an array, while slicing allows accessing ranges of elements or modifying subarrays. These techniques are useful for extracting and manipulating data within NumPy arrays.

Mathematical Operations

NumPy provides a wide range of mathematical functions to perform operations on arrays. These functions are optimized for efficiency and can be applied to arrays as a whole, without the need for loops. For example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(np.sin(arr))
# Output: [0.84147098 0.90929743 0.14112001 -0.7568025 -0.95892427]

print(np.sum(arr))
# Output: 15

arr2d = np.array([[1, 2, 3], [4, 5, 6]])

print(np.mean(arr2d))
# Output: 3.5

print(np.max(arr2d, axis=0))
# Output: [4 5 6]
        

Cumulative Sum

The np.cumsum() function in NumPy computes the cumulative sum of elements along a given axis in an array. Cumulative sum means that for each element in the array, it adds up all the elements that come before it, including itself.

Here's how it works with a simple example:

```python

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

cumulative_sum = np.cumsum(arr)

print(cumulative_sum)

```

The output will be:

```

[ 1 3 6 10 15]

```

Explanation of the output:

- The first element is 1, which is the same in the original array.

- The second element is 1 + 2 = 3, which is the sum of the first and second elements in the original array.

- The third element is 1 + 2 + 3 = 6, which is the sum of the first three elements in the original array.

- And so on...

In more complex arrays, you can use the np.cumsum() function to efficiently compute cumulative sums along different axes or dimensions. It's particularly useful in various mathematical and statistical operations, as well as in signal processing and time series analysis.

Matrix Operations:

np.dot(), np.multiply(), and np.matmul() are NumPy functions used for various types of mathematical operations involving arrays. Here are the key differences between them:

1. `np.dot()`:

- The np.dot() function performs different operations based on the dimensions of the input arrays:

- For 1-D arrays, it calculates the dot product (inner product) of the two arrays.

- For 2-D arrays, it performs matrix multiplication if both arrays are 2-D.

- For N-dimensional arrays, it's a sum product over the last axis of the first array and the second-to-last axis of the second array.

- np.dot(a, b) and a.dot(b) are equivalent when both a and b are 2-D arrays.

Example:

```python

import numpy as np

a = np.array([1, 2, 3])

b = np.array([4, 5, 6])

dot_product = np.dot(a, b) # Dot product of 1-D arrays

matrix_a = np.array([[1, 2], [3, 4]])

matrix_b = np.array([[5, 6], [7, 8]])

matrix_product = np.dot(matrix_a, matrix_b) # Matrix multiplication of 2-D arrays

```

2. `np.multiply()`:

- The np.multiply() function performs element-wise multiplication between two arrays.

- It can work with arrays of different shapes, as long as their shapes are compatible for broadcasting.

- If the input arrays have different shapes, NumPy will try to broadcast them to a common shape before performing the multiplication.

Example:

```python

import numpy as np

array_a = np.array([1, 2, 3])

array_b = np.array([4, 5, 6])

element_wise_product = np.multiply(array_a, array_b) # Element-wise multiplication

matrix_a = np.array([[1, 2], [3, 4]])

scalar = 2

scalar_multiplication = np.multiply(matrix_a, scalar) # Scalar multiplication

```

3. `np.matmul()`:

- The np.matmul() function is specifically designed for matrix multiplication.

- It behaves the same way as np.dot() for 2-D arrays (matrix multiplication).

- For N-dimensional arrays (where N > 2), it performs matrix multiplication on the last two dimensions.

- It's more explicit in its purpose, making it useful when dealing with higher-dimensional arrays.

Example:

```python

import numpy as np

matrix_a = np.array([[1, 2], [3, 4]])

matrix_b = np.array([[5, 6], [7, 8]])

matrix_product = np.matmul(matrix_a, matrix_b) # Matrix multiplication

```

In summary, np.dot() has different behaviors based on array dimensions, np.multiply() performs element-wise multiplication, and np.matmul() is used specifically for matrix multiplication. The choice of which function to use depends on the desired operation and the dimensions of the arrays involved.

Broadcasting

Broadcasting is a powerful feature in NumPy that allows for element-wise operations between arrays of different shapes, without the need for explicit looping. It eliminates the need to have arrays of the same shape to perform operations and enables more concise and efficient code.

The broadcasting rule states that two arrays are compatible for element-wise operations if their shapes are compatible or if one of the arrays can be broadcasted to match the other array's shape. NumPy broadcasts arrays by replicating the smaller array along the missing dimensions.

To understand broadcasting better, let's consider a few examples:

Example 1:

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([[4, 5, 6], [7, 8, 9]])

result = arr1 + arr2
print(result)
        

Output:

[[ 5  7  9]
[ 8 10 12]]
        

In this example,?arr1?has shape?(3,)?and?arr2?has shape?(2, 3). By broadcasting,?arr1?is expanded to?(2, 3)?by replicating its values along the first dimension. Then, element-wise addition is performed.

Example 2:

import numpy as np

arr1 = np.array([[1], [2]])
arr2 = np.array([3, 4, 5])

result = arr1 * arr2
print(result)
        

Output:

[[ 3  4  5]
[ 6  8 10]]
        

In this example,?arr1?has shape?(2, 1)?and?arr2?has shape?(3,). The array?arr1?is broadcasted to shape?(2, 3)?by replicating its values along the second dimension. Then, element-wise multiplication is performed.

Example 3:

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5])

result = arr1[:, np.newaxis] + arr2
print(result)
        

Output:

[[5 6]
[6 7]
[7 8]]
        

In this example,?arr1?has shape?(3,)?and?arr2?has shape?(2,). The?np.newaxis?is used to insert a new axis to?arr1, changing its shape to?(3, 1). Now,?arr1?can be broadcasted to shape?(3, 2), and element-wise addition is performed.

Broadcasting provides great flexibility in performing operations on arrays with different shapes. It simplifies code and improves performance by avoiding unnecessary loops. Understanding broadcasting is crucial when working with NumPy arrays in machine learning and data science tasks.


Applications in Machine Learning and Data Science

NumPy's fast and efficient operations make it an essential library in the fields of machine learning and data science. It provides the foundation for other libraries such as pandas and scikit-learn, enabling various data manipulation, preprocessing, and model building tasks.

For example, in machine learning, NumPy arrays are used to represent datasets, as well as model parameters. You can perform operations like data normalization, feature scaling, and matrix multiplications efficiently using NumPy functions.

In data science tasks, NumPy is used for handling large datasets, performing statistical calculations, data cleaning, and exploratory data analysis. It provides efficient data structures and operations that accelerate the analysis process.

Linear Algebra with Python and NumPy

Conclusion

NumPy is a powerful library for numerical computations in Python, offering versatile array objects and a wide range of mathematical functions. In machine learning and data science, NumPy plays a crucial role in handling datasets, performing mathematical operations on arrays, and accelerating the overall analysis process. By leveraging NumPy, you can streamline your code, improve performance, and unlock the full potential of Python for data-driven tasks.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了