Why learning Linear Algebra is important for Machine Learning
Ramkumar Rani
Gen AI | Agentic AI | RAG | Agentic Automation | Data | Lifelong Learner
Very often, aspirant Machine Learning(ML) engineers ask why they need to study mathematical concepts such as linear algebra, calculus, probability and statistics to become an ML expert. In this post, I will focus on necessity of linear algebra in machine learning. Linear algebra is a mathematical area that is immensely helpful for all engineering specialization, particularly Machine Learning.
Linear Algebra is the branch of mathematics that is mainly concerned about linear equations. In Machine Learning, the linear equations along with linear transformations are quite useful in many algorithms such as Least Square, etc. In other words, linear algebra, by converting input vectors into outputs using linear transformations, is essential to understand ML algorithms.
Broadly, the following concepts of linear algebra are applied across ML algorithms:
- Scalar, vector, matrix and tensors
- Operations on vectors and matrices
- Linear Dependence
- Linear Span
- Norms
- Matrix inverse & identity matrices
- Special types of matrices and vectors
- Singular Vector Decomposition (SVD)
- Eigendecomposition
- Dimensionality reduction – Principal Component Analysis (PCA)
In this article, I will explore a handful of linear algebra concepts and show how they are helpful in machine learning and deep learning. To begin with, linear algebra helps to convert numbers into structural patterns such as scalar, vectors, matrices and tensors. These structural patterns open new avenues in using special operations on matrices such as matrix inversion, matrix multiplication, etc. In particular, tensors play a major role in image processing and pattern recognition. For example, you need 3D tensors to process RGB color images. More importantly, linear algebra helps to understand, visualize more complex patterns of data – over 3 dimensions.
In machine learning, linear algebra simplifies the complexity of the data and present data in a concise form. Especially, in deep learning (a specialized branch of ML), the values of each row of a neuron in the network can be represented as a vector.
A simpler definition for linear algebra structure:
- Scalars: They represent a single number. Eg: 10, 10 is a scalar
- Vectors: A vector is a one-dimensional array of numbers. Vectors can be either row or column level structure
- Matrices: Matrices are 2-D structure and they represent both rows and columns. A specific matrix value is identified by row and column indices
- Tensors: Tensors are multidimensional arrays, typically represent more than 2-D structure. For example, a RGB image will have a 3-D tensor
Linear Algebra supports various matrix operations such as transpose, addition, multiplication, and so on. Considering following matrix transpose:
This matrix transpose can be thought of a mirror image across the main diagonal. As you can see, this operation is helpful in many machine learning (deep learning) tasks, especially in image recognition.
Another linear algebra area that plays a crucial role in machine learning and deep learning is Matrix multiplication (product). We can multiply two matrices with different dimensions, as long as number of columns of first matrix is equal to number of rows in second matrix. For example, consider the following matrix product:
In this case, the matrix X has got i rows and k columns, the matrix Y with k rows and j columns – when you perform matrix multiplication on them, the operation would product a new matrix Z with i rows and j columns. Matrix multiplication is critical in neural networks, wherein you have inputs are passed to next layers as vectors or matrices and are multiplied with weight matrix to produce the output.
Another linear algebra concept that comes handy in deep learning is Orthogonal matrices. Vectors x and y are orthogonal to each other if x(transpose)*y=0. This implies that x and y have 90 degrees to each other. Orthonormal matrices are useful in deep learning – they can be used to initialize the weights to avoid vanishing / exploding gradients. The exploring or vanishing gradients problem arises in deep learning when you try to multiply matrices at many steps across the network.
Calculating Norms is another linear algebra topic that plays a major role in machine learning. Norms are used to measure length of vectors. A commonly used norm is L2 , which measures Euclidean distance between vectors. It is commonly denoted as:
For example L2 norm is used in Ridge regression. L1 norm is also quite popular in machine learning and is used algorithms such as Lasso and it is defined as:
Ridge and Lasso regressions are popular regularization algorithms in Machine Learning.
Understanding these linear algebra concepts, you can move into eigendecomposition involving eigen values and eigen vectors. Eigendecomposition is pre-requisite with Principle Component Analysis (PCA), one of the commonly used machine learning algorithm for dimensionality reduction. PCA is probably the oldest and best known of the techniques of multivariate analysis. The goal of the PCA is to compute principal components. In essence, the computation of principal components involves identifying eigen value & eigen vectors.
Finally, I would like to emphasis how linear transformations play an important role in neural network. Let us consider a 1-layer multi-layer perceptron (MLP). This network is defined as
Here W1 and b1 are weight matrix and bias vector for the first linear transformation for the input, g is the nonlinear activation function applied element-wise. And, correspondingly, W2 and b2 are terms for 2nd linear transform defined for the network.
From the above short introduction, it is very clear that linear algebra plays an immense role in machine learning and deep learning. This article highlights some of the linear algebra concepts behind machine learning and I am sure that you will agree that linear algebra is one of the secret weapons in mastering machine learning.
Finally, this explains relationship between ML / DL and linear algebra:
Gen AI | Agentic AI | RAG | Agentic Automation | Data | Lifelong Learner
6 年Hi Bruce, Thanks for commenting.??I have Masters in Analytics & Data Science (focus: Statistical Machine Learning) and I work as Sr. Data Scientist.? ?Depending on the time availability, I will write more related topics on statistical machine learning and deep learning. Ram
Facility Management Consulting | FM Services | Asset Management | FM Strategy | Workplace Services | FM Software
6 年I'd love to know, Ramkumar, who introduced you to this topic?