Similarity Based Recommendation Systems Algorithms.
Chandra Prakash Bathula
Adjunct Faculty at Saint Louis University | Machine Learning Practitioner | Web Developer | GenAI Developer
So, among the types of recommender systems we have discussed previously and among the types that are available the simples type of recommender systems that we can design or build are “Similarity Based Recommender Systems”.
In this similarity based recommender systems there are two types of similarities that are available and we can make use of them.
They are:
One interesting fact is that Item-Item based similarity is popularized by Amazon in 2003. Basically it is implemented in large e-commerce like scale systems.
Now, from the practical standpoint let us first look the user-user based similarity.
User-User Similarity:
Imagine we are given with a massive data of Matrix A, where we have users as rows Ui, and items as Columns Ij. Now. let’s look at vector Ui and assume it as a column vector simplicity.
Now, this vector can also be thought as user vector. And also it is a sparse vector and is very similar to Bag Of Words. Remember that we have our Bag Of Words is a sparse vector which has counts
User Ui has rating on Item Ij is what we have here. Now, we can define the similarity of user ui and uj as cosine similarity between (ui,uj) which is nothing but L2 Norm.
=> Sim(Ui,Uj) = cosine(ui,uj).
Given this Matrix ‘A’, Now Imagine we can compute for every pair of users using this vector representation of a user-vector we can compute the similarity.
Let’s call these similarities as “Aij” on a user values.
Imagine building a Matrix “S” with Sij this is nothing but user-similarity Matrix.
Here, we are using cosine similarity but we can use any similarity is more popular because these are sparse vectors.
So, Now given this similarity Matrix, how does it look, visually
Let’s assume we are given an user U10, and the task is to recommend new items to U10 user.
Now that we will (or) can do is we will go to the U10 in the “similarity” Matrix and when we will look at the vector and whichever is the large value it as more similarity with U10. If we can declare U1, U2 & U7 are the three most similar users to U10 then we can say that they allotted similar ratings.
=> U10 — — — — >>U1, U2 & U7 are the most similar users to U10.
领英推荐
And how this user-similarity matrix built by using the ratings given by the user.
Now, what we can say is that pick items liked by U1, U2 & U7 that are not yet watched by U10 as recommendations.
=> U10 — — — —>> U1, U2, U7 are the three most similar users to U10.
This is how “user-user” based similarity recommender system work step by step.
Imagine, Take Youtube as an example today I might buying a smart watch or something and I’ll look at all the videos of reviews (or) product reviews (or) I might listen to a new artist songs.
It becomes much-much harder. Our preferences (or) tastes will be evolving over time and users preferences tend to change much more frequently than time. If not changing frequently we can use the data which is latest and build Matrix. (for example last 90 day data). A = [ ].
So, the alternative approach is “item-item” based similarity recommender system.
So, let’s see how to avoid this problem by using “item-item” similarity.
Item-Item based Recommender System:
It is very simple idea and is similar to “user-user” based scheme except that now we will replace each user “Ui” and represent each item “Ij” as a vector. And this is brought again by the “A” matrix.
Ij = [ ]
Here, the one key advantages of item based stuff is that
“→ Ratings on a given item do not change significantly after the initial period.”
Let’s take an example. let’s take a very popular movie called (Jurassic Park).
So, when Jurassic Park was released, probably in the first few days there are lot of ratings and let’s assume the average rating is 4/5 stars. Even though after the initial period most people recognize that the Jurassic Park movie is a brilliant movie in all the aspects be it graphics, story and all and it’s rating would not change as significantly.
Now as a Rule of Thumb:
When you have more users than items we know that Item ratings do not change much over time except for the initial period after the initial period.
When we know this it is better to user item-item based similarity RS over user-user RS. So, is more users than items “item-item RS” and computing item-item is more easy.
=> Su = For users; Si = Items.