登录查看更多内容

Understanding Vector Norms: A Comprehensive Guide to L1, L2, L∞, and Beyond...

Divesh Kubal

Senior Data Scientist at CrimsonAI | Expert in Generative AI, LLMs, & Deep Learning | Specializing Model Optimization, and Scalable ML Solutions | Passionate AI Blogger & Researcher

发布日期: 2024年7月11日

In our previous article, we discussed regularization in a simplified manner. Before looking into the different types of regularization, it's crucial to understand the concept of vector norms. This article will begin by covering the basics and characteristics of norms, and then provide an overview of the most commonly used vector norms. I have tried to make it simple as possible.

Importance of Vector Norms

Many applications such as search engines, personalized recommendations, sorting documents, and processing images rely on measuring how similar or different items are. Similarity is determined by how close items are to each other, and dissimilarity by how far apart they are.

To calculate this distance, each item is represented as a vector in an n-dimensional space, where each dimension corresponds to a specific attribute or feature of the item. By using these vector representations, we can compute distances between pairs of items using standard similarity measures like the Manhattan distance or Euclidean distance.

In these similarity calculations, vector norms play a crucial role. They are fundamental in machine learning contexts, helping to quantify the size or magnitude of vectors, which in turn influences how distances between items are interpreted and computed.

What is a Norm?

In simple terms, a norm is a way to measure how big a vector, matrix, or tensor is.

Essentially, norms are functions that help us understand the size of these mathematical objects (vectors). For example, the norm of a vector X tells us its length starting from the origin as we can see in the figure.

Applications of Norms in Machine Learning

In machine learning, norms are important in several ways:

They are used in defining loss functions, which help measure how different actual and predicted values are.
Norms are also used in regularization techniques in machine learning, such as ridge (L2) and lasso (L1) regularization, which help prevent overfitting.
Algorithms like Support Vector Machines (SVM) use norms to calculate distances between the decision boundary and the support vectors.

Overall, norms are a fundamental concept in machine learning that help with measuring, regularizing, and optimizing models.

Representation of Norms

The norm of any vector x is depicted by putting double bars around it, like this:

Norm of vector x = ||x||

L1 Norm / Manhattan Norm

The L1 norm of a vector is calculated by adding together the absolute values of all its components. For a vector x with just two components, its L1 norm can be expressed as:

The norm represented with a subscript of one is known as the Manhattan or taxicab norm, named after the borough of Manhattan in New York City. This norm reflects the distance a taxi would travel from the starting point to reach point x.

Mathematical Expression of L1 Norm

The L1 norm can be mathematically written as:

Properties of the L1/Manhattan Norm?

The L1 norm is employed when it's important to differentiate between values that are zero and those that aren't.
The L1 norm increases linearly around the origin.
This norm is particularly useful in Lasso regression, where it's added to the loss function as a penalty term based on the coefficients.

领英推荐

How to Create Custom LLMs From Scratch - Interview…

Vincent Granville 7 个月前

Ultrafast Large-Scale Vector Search for LLM, Graph…

Vincent Granville 1 年前

YOLOR vs YOLOX: Battle Of The Object Detection…

Ritesh Kanjee 3 年前

L2 Norm / Euclidean Norm

The L2 norm is the most commonly used norm and is widely applicable in real-world scenarios. It measures the shortest straight-line distance from the origin to the vector. Mathematically, it is computed as the square root of the sum of the squares of the vector's components. For our vector X, the L2 norm would be calculated as:

The L2 norm is also called the Euclidean norm, named after the renowned Greek mathematician who founded geometry. Essentially, the Euclidean norm refers to the Euclidean distance.

Mathematical Notation of L2 Norm:

Important Properties of the L2 Norm?

The L2 norm is widely used in machine learning.
Because it squares each component of the vector, it can be influenced by outliers.
Near the origin, the L2 norm increases gradually..
In ridge regression, the L2 norm of the coefficients is added to the loss function as a penalty term.

L∞/Max Norm

The L∞ norm, also known as the max norm, is defined as the absolute value of the largest component in the vector. For instance, in a 2D vector X with components x? and x? (where x? > x?), the L∞ norm would be the absolute value of x?.

The L∞ norm is simply the absolute value of the largest element in the vector.

Mathematical Notation of L∞

L? Norm

Now, let's generalize the p-norm. Essentially, we can derive all other norms from the p-norm by adjusting the value of p. For instance, if you substitute p with values like one, two, and ∞ in the formula below, you'll get the L1, L2, and L∞ norms, respectively.

What happens when p equals zero? / L? "Norm" and Sparsity

When p equals zero, we consider what's often referred to as the L? "norm." Strictly speaking, it's not a norm because it doesn't satisfy the homogeneous property that norms typically have. However, the L? "norm" is valuable when we want to count the number of non-zero components in a vector. This is particularly useful for modeling sparsity, a crucial concept in machine learning. Sparsity helps enhance robustness and prevent overfitting by identifying and leveraging the most significant features while discarding less important ones.

Conclusion

In conclusion, vector norms are essential tools in machine learning for quantifying the size and distance between data points represented as vectors. This article has explored fundamental norms such as the L1, L2, and L∞ norms, each serving distinct purposes from penalizing coefficients in regularization techniques to measuring distances in algorithms like SVMs. Understanding these norms not only aids in mathematical clarity but also enhances model performance by managing complexity, addressing outliers, and promoting sparsity.

Stay tuned for the next article, where we will study into L1 and L2 Regularization, exploring their mathematical foundations and practical applications in machine learning.

Amol Patil

Assistant Professor in A. C. Patil College of Engineering

8 个月

I would be happy if you provide a python implementation of all norms with respect to any application

1 次回应

查看更多评论

要查看或添加评论，请登录

Divesh Kubal的更多文章

FlashAttention-3: Fast, Accurate, and Efficient AI with Asynchrony and Low-Precision

2024年8月9日

FlashAttention-3: Fast, Accurate, and Efficient AI with Asynchrony and Low-Precision

Transformers are a type of machine learning model used for tasks like understanding language and analyzing images. They…
How Statistical Significance Can Change Your Research Outcomes

2024年8月1日

How Statistical Significance Can Change Your Research Outcomes

Understanding "Statistical Significance" can be tricky for many people new to statistics, but it doesn't have to be…
How to Craft a Perfect Hypothesis? - Ultimate Guide to NULL and RESEARCH Hypothesis

2024年7月28日

How to Craft a Perfect Hypothesis? - Ultimate Guide to NULL and RESEARCH Hypothesis

Ever wondered how scientists and researchers come up with those groundbreaking ideas? It all starts with a well-crafted…
Adam, AdaGrad, RMSProp, Delta-Bar-Delta - Which Learning Rate Strategy Will Enhance Your Model?

2024年7月18日

Adam, AdaGrad, RMSProp, Delta-Bar-Delta - Which Learning Rate Strategy Will Enhance Your Model?

Neural network researchers have found that setting the learning rate is tricky but crucial, as it heavily influences…

2 条评论
Tired of Slow Learning? Momentum Could Be the Boost Stochastic Gradient Descent Needs!

2024年7月17日

Tired of Slow Learning? Momentum Could Be the Boost Stochastic Gradient Descent Needs!

Stochastic gradient descent is a popular way to optimize learning, but it can sometimes be a bit slow. That's where…

2 条评论
Why Stochastic Gradient Descent (SGD) is a Game Changer?

2024年7月15日

Why Stochastic Gradient Descent (SGD) is a Game Changer?

Stochastic Gradient Descent (SGD) is one of the most popular methods for improving machine learning models, especially…
The Intriguing Challenges of Neural Network Optimization

2024年7月14日

The Intriguing Challenges of Neural Network Optimization

Optimization can be tough, especially in deep learning. While traditional methods often rely on convex problems for…

3 条评论
Mastering Regularization: The Complete Guide to All Strategies

2024年7月13日

Mastering Regularization: The Complete Guide to All Strategies

In the previous articles we learned, How regularization in machine learning is vital for preventing overfitting and…
Power of Regularization: Simplifying L1 and L2 Math for Everyone

2024年7月12日

Power of Regularization: Simplifying L1 and L2 Math for Everyone

In the previous article on - Understanding Vector Norms: A Comprehensive Guide to L1, L2, L∞, and Beyond..

2 条评论
Regularization Made Simple: A Quick and Easy Guide

2024年7月10日

Regularization Made Simple: A Quick and Easy Guide

Introduction to Regularization In machine learning, a major challenge is ensuring that a model performs well not only…

4 条评论

See all articles

Understanding Vector Norms: A Comprehensive Guide to L1, L2, L∞, and Beyond...

Divesh Kubal

Senior Data Scientist at CrimsonAI | Expert in Generative AI, LLMs, & Deep Learning | Specializing Model Optimization, and Scalable ML Solutions | Passionate AI Blogger & Researcher

Importance of Vector Norms

What is a Norm?

Applications of Norms in Machine Learning

Representation of Norms

L1 Norm / Manhattan Norm

Mathematical Expression of L1 Norm

Properties of the L1/Manhattan Norm?

领英推荐

L2 Norm / Euclidean Norm

Mathematical Notation of L2 Norm:

Important Properties of the L2 Norm?

L∞/Max Norm

Mathematical Notation of L∞

L? Norm

What happens when p equals zero? / L? "Norm" and Sparsity

Conclusion

Divesh Kubal的更多文章

社区洞察

其他会员也浏览了

K-Means Clustering in Machine Learning

State of the Graph: Knowledge Graphs Emerge As First Killer App

Simulating the Physical World: Stochastic 'Model-driven' Digital Twins.

Introducing a Novel Approach in Feature Selection Convergence

A chat with GPT

Revealing the Geometric Bridge: Transformers and Support Vector Machines in Optimization Geometry

Topic 20: What is Flow Matching?

Machine learning for IRB models

To RAG or not to RAG: That is the question.

Did you know that detecting people in complete darkness is no longer science fiction? ?? Imagine combining thermal vision, state-of-the-art AI, and so

Importance of Vector Norms

What is a Norm?

Applications of Norms in Machine Learning

Representation of Norms

L1 Norm / Manhattan Norm

Mathematical Expression of L1 Norm

Properties of the L1/Manhattan Norm?

领英推荐

L2 Norm / Euclidean Norm

Mathematical Notation of L2 Norm:

Important Properties of the L2 Norm?

L∞/Max Norm

Mathematical Notation of L∞

L? Norm

What happens when p equals zero? / L? "Norm" and Sparsity

Conclusion

Divesh Kubal的更多文章

FlashAttention-3: Fast, Accurate, and Efficient AI with Asynchrony and Low-Precision

How Statistical Significance Can Change Your Research Outcomes

How to Craft a Perfect Hypothesis? - Ultimate Guide to NULL and RESEARCH Hypothesis

Adam, AdaGrad, RMSProp, Delta-Bar-Delta - Which Learning Rate Strategy Will Enhance Your Model?

Tired of Slow Learning? Momentum Could Be the Boost Stochastic Gradient Descent Needs!

Why Stochastic Gradient Descent (SGD) is a Game Changer?

The Intriguing Challenges of Neural Network Optimization

Mastering Regularization: The Complete Guide to All Strategies

Power of Regularization: Simplifying L1 and L2 Math for Everyone

Regularization Made Simple: A Quick and Easy Guide

社区洞察

其他会员也浏览了

K-Means Clustering in Machine Learning

State of the Graph: Knowledge Graphs Emerge As First Killer App

Simulating the Physical World: Stochastic 'Model-driven' Digital Twins.

Introducing a Novel Approach in Feature Selection Convergence

A chat with GPT

Revealing the Geometric Bridge: Transformers and Support Vector Machines in Optimization Geometry

Topic 20: What is Flow Matching?

Machine learning for IRB models

To RAG or not to RAG: That is the question.

Did you know that detecting people in complete darkness is no longer science fiction? ?? Imagine combining thermal vision, state-of-the-art AI, and so