登录查看更多内容

Machine learning: intuition of 'cosine similarity' in recommendation

Chen Yang??????

Machine & Deep Learning | Big Data Cloud

发布日期: 2018年1月24日

Recommendation system in machine learning often relies on the measurement of similarity. One of the often used options is 'cosine similarity'. I wrote this article to explain the intuition why the formula can measure similarity.

For recommendation in machine learning, suppose you aim to recommend some products/services to clients and should gear to their taste. You have samples that some client rated on the product/service, from those samples to find similarity among users or among product/service, and based on the similarity to infer how the client rate the specific product/service. Furthermore, making use of the inferred rating help recommend more product/service to the client. The formula to measure similarity is primarily chosen as 'cosine' as shown in above slide.

The above gives an example of client rating different movies. The first column lists clients and the first row iterates different movies. The cell horizontally aligned with a client, vertically aligned with a movie, gives how much the client rates the movie. Using the cosine formula, we can calculate the similarity between clients Ofer and Danny, 0.993 which means they have similar tastes. Since Danny has not evaluated movie 'Argo', so we can infer Danny will rate that movie with value 0.993*4=3.972

Before fully understanding the cosine formula, let us first review the dry math of dot product (or inner product) between two vectors. Suppose we have two vectors each has two dimensions x and y: a = (a_x, a_y) and b=(b_x, b_y). For example, vector projects a_x horizontally on x-axis, and projects a_y vertically on y-axis. Dot product between vectors aims to measure how much they are similar. The dot product formula is gotten by multiplying their every single piece and finally accumulates only in the same dimentions (a_x*b_x in x-aixs and a_y*b_y in y-aixs). That is why we can use dot product to emulate their similarity.

Furthermore, we can rewrite the dot product formula by rotating one vector until it's horizontal and take it as the x-axis. Now things get no change because the relationship between the two vectors is not changed. Using the same logic to compute the dot product we found the dot product formula is rewritten as |a|*|b|*cos(theta), where theta is the angle between the two vectors.

And we can get equation a_x*b_x+a_y*b_y=|a|*|b|*cos(theta), which deduct to the cosine similarity we used in recommendation system

That basically explains where 'cosine similarity' comes from, and why it can be used to measure similarity from the intuition of dot product.

Chen Yang

Machine learning: intuition of 'cosine similarity' in recommendation

Chen Yang??????

Machine & Deep Learning | Big Data Cloud

更多精彩文章

社区洞察

其他会员也浏览了

Understanding Bagging in Machine Learning: Combat Overfitting and Boost Accuracy

Understanding the F1 Score in Machine Learning: Precision, Recall, and Model Performance

Extracting Graph Level Features from Graphs for Machine Learning Models: Part 4 of X of my notes

7 Common Challenges in 2023 - Machine Learning

Choosing Between Machine Learning and Rule-Based Algorithms: Practical Insights

Performance Matrix in Machine Learning

Performance Matrix in Machine Learning

Mastering Linear Discriminant Analysis in Machine Learning

Model Optimization in Machine Learning: Random vs. Grid?Search

10 Must-Know Machine Learning Algorithms for 2024

Practice on using ansible 2.4 to deploy HDP 2.6.4.0

2018年4月18日

Deep learning--CNN: localization in object detection (1/2)

2018年4月3日

Deep learning--CNN: classic ConvNet, residual networks, inception network

2018年3月20日

Deep learning--CNN: Padding, strided convolution, convolution over volume, pooling layer

2018年3月12日

Deep learning--CNN: Edge detection

2018年3月11日

Deep learning: End-to-end deep learning

2018年3月7日

Deep learning: Transfer learning, multitask learning

2018年3月6日

Deep learning: Training and testing on different distributions

2018年3月5日

Deep learning: Error analysis

2018年3月4日

Deep learning: human-level performance

2018年3月3日

社区洞察

其他会员也浏览了

Understanding Bagging in Machine Learning: Combat Overfitting and Boost Accuracy

Understanding the F1 Score in Machine Learning: Precision, Recall, and Model Performance

Extracting Graph Level Features from Graphs for Machine Learning Models: Part 4 of X of my notes

7 Common Challenges in 2023 - Machine Learning

Choosing Between Machine Learning and Rule-Based Algorithms: Practical Insights

Performance Matrix in Machine Learning

Performance Matrix in Machine Learning

Mastering Linear Discriminant Analysis in Machine Learning

Model Optimization in Machine Learning: Random vs. Grid?Search

10 Must-Know Machine Learning Algorithms for 2024