#3. Math for ML Part 1: Linear Algebra
Aarzoo Chourasia
Senior Data Scientist @Sony | Ex-Amazon | Harvard | Math Enthusiast | Bibliophile | Explorer
In this edition, we will go through the core concepts in Linear Algebra and their significance in Machine Learning. We learn better when we can tie the concepts to some problems and examples. Linear Algebra is amazingly powerful and almost all data can be represented in form of matrices. Book's section with annotations on GitHub.
Knowing these basics in Linear Algebra is enough to get started. We don't have to do a lot of deep dive in the beginning as the ideas will consolidate once we see them in action.
I have taken crisp notes from the awesome 3b1b Essence of Linear Algebra playlist (the formatting in few lines can be ignored as no content is missing). I hope these notes will be helpful for refreshing the concepts or a quick revision before interviews. That said, I would highly recommend you to watch the videos and use these notes as a companion, and not as a replacement.
Scalars, Vectors, and Tensors: Building blocks of Linear Algebra. As the name suggests, scalars are real numbers that scale objects/quantities. Vectors in ML context store and represent data points in a space. Tensors are data structures, nD arrays to store numbers.
Note that dimension of a vector is the number of elements/components in it. Whereas, dimension of a tensor is the number of axis. Eg. [1,5,8,3] is a 1D tensor but 4D vector. A matrix is a 2D tensor that is collection of 1D tensors (vectors), a cube is a 3D tensor that is a collection of 2D tensors (matrices) and so on. Think of text in NLP as 3D tensors (collection of 2D tensors that is sentences). In CV, images as 4D tensors (collection of 3D tensors, each 3D tensor is made by three 2D RGB matrices) and videos (collection of images) as 5D tensors. You get the idea.
Row vector and Column vector: Column vector (nx1) is default representation of a data point in Rn (nD space i.e. number of entries in column). Row vectors (1xm) are linear functions in Rm that operate on column vectors.
Distance from origin: This is the distance of a point in vector space from the origin. Often used in anomaly/fraud detection.
Euclidean distance: A measure of similarity between two points, calculated as the straight-line distance between them. It is key in clustering algorithms like k-means.
Shifting: Adjusts data to measures of central tendency without changing the variance. Used in data preprocessing to center features to reduce bias in gradient descent optimization.
Scaling: Multiplying scalar to a vector resizes the vector. Used in data preparation, feature engineering and algorithms sensitive to difference in vector magnitude such as gradient based models.
Dot product: It measures similarity between vectors. In NLP, cosine similarity uses the dot product to find the angle between word vectors/embeddings.
Angle between vectors: Defined from dot product. It indicates similarity; small angles between vectors imply closeness.
Cross product: Also known as vector product. Multiplies two vectors in three-dimensional space to produce a vector that is perpendicular to both of the original vectors. Used in AR and 3D graphics orientation.
Unit vectors: Standardize vectors to unit length, help in feature scaling and normalization for more stable gradient descent.
Projection of a vector: Projects data points onto directions of interest. Used in PCA and dimensionality reduction.
Basis vectors: Serve as foundational vectors to express other vectors, crucial for representing feature spaces and transformations.
Equation in n-D geometry: Defines boundaries in high-dimensional spaces. SVMs use hyperplanes to separate classes in high-dimensional data.
Vector norms: Measure vector length (L1/L2 norm) for regularization, handle model complexity in linear and logistic regression, and make predictions less sensitive to noise.
Vector spaces: Set of vectors to perform operations and transformations. Foundational in ML tasks like embedding NLP or image data into comparison-ready forms.
Linear independence: Vectors are linearly independent if none of them is a linear combination of others, ensuring no redundancy in features. Used in PCA and dimensionality reduction.
Orthogonal matrices: When column/row vectors have length one, and are pairwise orthogonal, i.e. dot product is zero. Used in whitening transformations and PCA.
Symmetric matrices: They are symmetric along the principle diagonal. Used in covariance matrices to capture feature dependencies.
Diagonal matrices: All the non- principle diagonal elements are zeros. They are simpler to invert and useful in scaling features.
Matrix equality: Ensures matrices have the same entries, which is essential when comparing transformation outputs.
Scalar operations on matrices: Scalar operations on matrix define arithmetic on data matrices. For example, adding biases to activations in neural networks
领英推荐
Matrix addition, subtraction, multiplication: Combine and transform data, enabling key operations in neural networks, linear transformations, and gradient calculations essential for model training.
Linear Transformations: Applying scaling, rotation, or translation to data, enable dimensionality reduction, feature extraction, and adjustments in neural network layers.
Matrix multiplication as composition: Combines multiple transformations into one. Used in neural networks, sequential models for efficient data processing.
Transpose: Reorients a matrix by swapping rows and columns. Aligns dimensions for operations in neural networks and simplifying calculations in linear transformations.
Determinant: A scalar that indicates a matrix's scaling factor and invertibility, crucial for assessing transformations, solving systems of equations and understanding the impact of transformations.
Matrix Inverse: Reverses (undo) transformations, crucial in linear regression and optimization problems.
Change of Basis: Redefines data in a new coordinate system. Enables dimensionality reduction and feature extraction in techniques like PCA for better data representation.
Rank of a matrix: Measures matrix dimensionality. Helps to detect data redundancy and the nature of solution to the system of linear equations.
Eigen vectors, Eigen values: Eigenvectors remain unchanged in direction/span after transformations, with eigenvalues as scale factors. Used in PCA, where data is transformed onto principal components and for dimensionality reduction.
LU decomposition: Breaks a matrix into lower and upper triangular matrices, simplifies solving linear equation and optimizing algorithms like linear regression.
QR decomposition: Separates a matrix into orthogonal and upper triangular parts, offering stability in least-squares regression and dimensionality reduction tasks.
Eigen decomposition: Extracts eigenvalues and eigenvectors, critical in PCA and spectral clustering, where it helps uncover data structure and reduce dimensions for efficient processing.
Singular value decomposition: A matrix factorization that breaks down a matrix into two orthogonal matrices and a diagonal matrix of singular values. Crucial in dimensionality reduction and RecSys.
Non-Negative Matrix Factorization: Breaks down data into non-negative components, helps in feature extraction in topics modeling and image analysis.
Moore-Penrose Pseudoinverse: Provides approximate solutions for over/under-determined system of linear equations, essential in linear regression for non-invertible (singular) matrices.
Quadratic Forms: Expresses convex optimization problems, common in ML models where objective functions need to be minimized.
Positive definite matrices: Square symmetric matrix where all eigenvalues are positive. Ensures convex optimization, which is critical in logistic regression and neural networks.
Hadamard product: Element-wise multiplication of vectors of same dimension. Used in attention mechanisms in NLP and feature-wise transformations.
Analytic geometry: Helps in understanding the transformations and provide geometric interpretations of ML solutions, such as SVMs.
I share all my notes and free resources on GitHub and if possible I will create short video explanations for them.
Thank you so much for reading!
If you are as excited as I am, do follow along.
I share my experiences and learnings in my newsletters . If you find them helpful and aligned with your interests, please spread the word and consider subscribing . (It's free, but would mean a lot to me.??)
Until next week. Chao!!
Consultant at EXL | Specializing in Fraud and Credit Risk Modeling
2 周This is very informative. Thanks for sharing your notes, this will be very helpful. Aarzoo Chourasia
Senior Data Scientist @Sony | Ex-Amazon | Harvard | Math Enthusiast | Bibliophile | Explorer
3 周I have taken crisp notes from the awesome 3b1b Essence of Linear Algebra playlist (the formatting in few lines can be ignored as no content is missing). https://shorturl.at/lAT9N Book's section with annotations on GitHub. https://shorturl.at/GTsXk