Understanding Vector Norms: A Comprehensive Guide to L1, L2, L∞, and Beyond...
Divesh Kubal
Senior Data Scientist at CrimsonAI | Expert in Generative AI, LLMs, & Deep Learning | Specializing Model Optimization, and Scalable ML Solutions | Passionate AI Blogger & Researcher
In our previous article, we discussed regularization in a simplified manner. Before looking into the different types of regularization, it's crucial to understand the concept of vector norms. This article will begin by covering the basics and characteristics of norms, and then provide an overview of the most commonly used vector norms. I have tried to make it simple as possible.
Importance of Vector Norms
Many applications such as search engines, personalized recommendations, sorting documents, and processing images rely on measuring how similar or different items are. Similarity is determined by how close items are to each other, and dissimilarity by how far apart they are.
To calculate this distance, each item is represented as a vector in an n-dimensional space, where each dimension corresponds to a specific attribute or feature of the item. By using these vector representations, we can compute distances between pairs of items using standard similarity measures like the Manhattan distance or Euclidean distance.
In these similarity calculations, vector norms play a crucial role. They are fundamental in machine learning contexts, helping to quantify the size or magnitude of vectors, which in turn influences how distances between items are interpreted and computed.
What is a Norm?
In simple terms, a norm is a way to measure how big a vector, matrix, or tensor is.
Essentially, norms are functions that help us understand the size of these mathematical objects (vectors). For example, the norm of a vector X tells us its length starting from the origin as we can see in the figure.
Applications of Norms in Machine Learning
In machine learning, norms are important in several ways:
Overall, norms are a fundamental concept in machine learning that help with measuring, regularizing, and optimizing models.
Representation of Norms
The norm of any vector x is depicted by putting double bars around it, like this:
Norm of vector x = ||x||
L1 Norm / Manhattan Norm
The L1 norm of a vector is calculated by adding together the absolute values of all its components. For a vector x with just two components, its L1 norm can be expressed as:
The norm represented with a subscript of one is known as the Manhattan or taxicab norm, named after the borough of Manhattan in New York City. This norm reflects the distance a taxi would travel from the starting point to reach point x.
Mathematical Expression of L1 Norm
The L1 norm can be mathematically written as:
Properties of the L1/Manhattan Norm?
领英推荐
L2 Norm / Euclidean Norm
The L2 norm is the most commonly used norm and is widely applicable in real-world scenarios. It measures the shortest straight-line distance from the origin to the vector. Mathematically, it is computed as the square root of the sum of the squares of the vector's components. For our vector X, the L2 norm would be calculated as:
The L2 norm is also called the Euclidean norm, named after the renowned Greek mathematician who founded geometry. Essentially, the Euclidean norm refers to the Euclidean distance.
Mathematical Notation of L2 Norm:
Important Properties of the L2 Norm?
L∞/Max Norm
The L∞ norm, also known as the max norm, is defined as the absolute value of the largest component in the vector. For instance, in a 2D vector X with components x? and x? (where x? > x?), the L∞ norm would be the absolute value of x?.
The L∞ norm is simply the absolute value of the largest element in the vector.
Mathematical Notation of L∞
L? Norm
Now, let's generalize the p-norm. Essentially, we can derive all other norms from the p-norm by adjusting the value of p. For instance, if you substitute p with values like one, two, and ∞ in the formula below, you'll get the L1, L2, and L∞ norms, respectively.
What happens when p equals zero? / L? "Norm" and Sparsity
When p equals zero, we consider what's often referred to as the L? "norm." Strictly speaking, it's not a norm because it doesn't satisfy the homogeneous property that norms typically have. However, the L? "norm" is valuable when we want to count the number of non-zero components in a vector. This is particularly useful for modeling sparsity, a crucial concept in machine learning. Sparsity helps enhance robustness and prevent overfitting by identifying and leveraging the most significant features while discarding less important ones.
Conclusion
In conclusion, vector norms are essential tools in machine learning for quantifying the size and distance between data points represented as vectors. This article has explored fundamental norms such as the L1, L2, and L∞ norms, each serving distinct purposes from penalizing coefficients in regularization techniques to measuring distances in algorithms like SVMs. Understanding these norms not only aids in mathematical clarity but also enhances model performance by managing complexity, addressing outliers, and promoting sparsity.
Stay tuned for the next article, where we will study into L1 and L2 Regularization, exploring their mathematical foundations and practical applications in machine learning.
Assistant Professor in A. C. Patil College of Engineering
8 个月I would be happy if you provide a python implementation of all norms with respect to any application