Similarity Based Recommendation Systems Algorithms.

Similarity Based Recommendation Systems Algorithms.

No alt text provided for this image
Photo by Kelly Sikkema on Unsplash
So, among the types of recommender systems we have discussed previously and among the types that are available the simples type of recommender systems that we can design or build are “Similarity Based Recommender Systems”.

In this similarity based recommender systems there are two types of similarities that are available and we can make use of them.

They are:

  • Item-Item based Similarity &
  • User-User based Similarity.

One interesting fact is that Item-Item based similarity is popularized by Amazon in 2003. Basically it is implemented in large e-commerce like scale systems.

Now, from the practical standpoint let us first look the user-user based similarity.

User-User Similarity:

Imagine we are given with a massive data of Matrix A, where we have users as rows Ui, and items as Columns Ij. Now. let’s look at vector Ui and assume it as a column vector simplicity.

No alt text provided for this image
Similarity Matrix
No alt text provided for this image
User and Item Features.
Now, this vector can also be thought as user vector. And also it is a sparse vector and is very similar to Bag Of Words. Remember that we have our Bag Of Words is a sparse vector which has counts
No alt text provided for this image
User Vector which is a Sparse Vector and is similar to Bag Of Words.

User Ui has rating on Item Ij is what we have here. Now, we can define the similarity of user ui and uj as cosine similarity between (ui,uj) which is nothing but L2 Norm.

=> Sim(Ui,Uj) = cosine(ui,uj).

No alt text provided for this image
Cosine Similarity Computation.

Given this Matrix ‘A’, Now Imagine we can compute for every pair of users using this vector representation of a user-vector we can compute the similarity.

Let’s call these similarities as “Aij” on a user values.

Imagine building a Matrix “S” with Sij this is nothing but user-similarity Matrix.
No alt text provided for this image

Here, we are using cosine similarity but we can use any similarity is more popular because these are sparse vectors.

So, Now given this similarity Matrix, how does it look, visually

No alt text provided for this image
No alt text provided for this image
User User Similarity Matrix.

Let’s assume we are given an user U10, and the task is to recommend new items to U10 user.

Now that we will (or) can do is we will go to the U10 in the “similarity” Matrix and when we will look at the vector and whichever is the large value it as more similarity with U10. If we can declare U1, U2 & U7 are the three most similar users to U10 then we can say that they allotted similar ratings.

=> U10 — — — — >>U1, U2 & U7 are the most similar users to U10.

And how this user-similarity matrix built by using the ratings given by the user.
Now, what we can say is that pick items liked by U1, U2 & U7 that are not yet watched by U10 as recommendations.

=> U10 — — — —>> U1, U2, U7 are the three most similar users to U10.

No alt text provided for this image
Items picked by similar users will be recommended to the user.
This is how “user-user” based similarity recommender system work step by step.

  • But there is one small problem with “user-user” similarity based recommender system.
  • That problem is user’s preference change over time and there is no way this can be done by using user-user based similarity matrix.

Imagine, Take Youtube as an example today I might buying a smart watch or something and I’ll look at all the videos of reviews (or) product reviews (or) I might listen to a new artist songs.

No alt text provided for this image

It becomes much-much harder. Our preferences (or) tastes will be evolving over time and users preferences tend to change much more frequently than time. If not changing frequently we can use the data which is latest and build Matrix. (for example last 90 day data). A = [ ].

So, the alternative approach is “item-item” based similarity recommender system.

So, let’s see how to avoid this problem by using “item-item” similarity.

Item-Item based Recommender System:

It is very simple idea and is similar to “user-user” based scheme except that now we will replace each user “Ui” and represent each item “Ij” as a vector. And this is brought again by the “A” matrix.

Ij = [ ]

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
Cosine Similarity For Items.

Here, the one key advantages of item based stuff is that

“→ Ratings on a given item do not change significantly after the initial period.”

Let’s take an example. let’s take a very popular movie called (Jurassic Park).

So, when Jurassic Park was released, probably in the first few days there are lot of ratings and let’s assume the average rating is 4/5 stars. Even though after the initial period most people recognize that the Jurassic Park movie is a brilliant movie in all the aspects be it graphics, story and all and it’s rating would not change as significantly.

  • Ratings on a given product/item do not change significantly over a time after the initial period.
  • In the initial period of time there will be more (or) a lot of positive comments, negative comments, pros and cons all of that but after a period of time the ratings will be more (or) less stabilized.
  • And this is the reason why e-commerce companies like Amazon preferred this approach. Now, once you have the similarity Matrix, it is very-very simple.
  • Imagine we have a user U10 to whom we have to recommend a products. Now we know what he likes and now to recommend a new product we will say take all the products that are similar to I1.

No alt text provided for this image
Item Recommendation Based On Historical Data.

  • Now, if we have as an example I4 is present in all those sets. Now, the probability that the U10 like I4 is high.
  • As we know U10 likes I1, I2 & I7 and I4 is similar to I1 & I2 then the probability of I4 being liked by U10 is high.

Now as a Rule of Thumb:

When you have more users than items we know that Item ratings do not change much over time except for the initial period after the initial period.

No alt text provided for this image
More Users Than Items.

When we know this it is better to user item-item based similarity RS over user-user RS. So, is more users than items “item-item RS” and computing item-item is more easy.

=> Su = For users; Si = Items.

要查看或添加评论,请登录

Chandra Prakash Bathula的更多文章

社区洞察

其他会员也浏览了