Machine learning: intuition of 'cosine similarity' in recommendation
Recommendation system in machine learning often relies on the measurement of similarity. One of the often used options is 'cosine similarity'. I wrote this article to explain the intuition why the formula can measure similarity.
For recommendation in machine learning, suppose you aim to recommend some products/services to clients and should gear to their taste. You have samples that some client rated on the product/service, from those samples to find similarity among users or among product/service, and based on the similarity to infer how the client rate the specific product/service. Furthermore, making use of the inferred rating help recommend more product/service to the client. The formula to measure similarity is primarily chosen as 'cosine' as shown in above slide.
The above gives an example of client rating different movies. The first column lists clients and the first row iterates different movies. The cell horizontally aligned with a client, vertically aligned with a movie, gives how much the client rates the movie. Using the cosine formula, we can calculate the similarity between clients Ofer and Danny, 0.993 which means they have similar tastes. Since Danny has not evaluated movie 'Argo', so we can infer Danny will rate that movie with value 0.993*4=3.972
Before fully understanding the cosine formula, let us first review the dry math of dot product (or inner product) between two vectors. Suppose we have two vectors each has two dimensions x and y: a = (a_x, a_y) and b=(b_x, b_y). For example, vector projects a_x horizontally on x-axis, and projects a_y vertically on y-axis. Dot product between vectors aims to measure how much they are similar. The dot product formula is gotten by multiplying their every single piece and finally accumulates only in the same dimentions (a_x*b_x in x-aixs and a_y*b_y in y-aixs). That is why we can use dot product to emulate their similarity.
Furthermore, we can rewrite the dot product formula by rotating one vector until it's horizontal and take it as the x-axis. Now things get no change because the relationship between the two vectors is not changed. Using the same logic to compute the dot product we found the dot product formula is rewritten as |a|*|b|*cos(theta), where theta is the angle between the two vectors.
And we can get equation a_x*b_x+a_y*b_y=|a|*|b|*cos(theta), which deduct to the cosine similarity we used in recommendation system
That basically explains where 'cosine similarity' comes from, and why it can be used to measure similarity from the intuition of dot product.
Chen Yang