Machine Learning 10: 'Recommendation System'

Machine Learning 10: 'Recommendation System'

Why do the we care about the Recommendation Systems?

The answer to this question may be different based on different perspective. For example, for companies like Amazon, Spotify and Netflix is to generate more and more revenues and drive a significant amount of engagement to their websites that results in an exponential growth in their marketplace. But, for people using Amazon, Spotify and Netflix, it means saving their time and getting the things of their interest and those which are being highly liked into their suggestions, so that they don’t have to search for it, this is the essence of Recommendation Systems or Recommendation Engines.

Conceptually Recommended Systems or Recommendation Engines use two types of recommendation approach (or approaches).

1. Collaborative filtering (CF), 

2. Content-based filtering (CBF)

Collaborative Filtering


Collaborative filtering, one of the earliest forms of recommendation systems. The earliest developed forms of these algorithms are also known as neighborhood based or memory based algorithms. If using machine learning or statistical model methods, they're referred to as model based algorithms. The basic idea of collaborative filtering is that given a large database of ratings profiles for individual users on what they rated/purchased, we can impute or predict ratings on items not rated/purchased by them, forming the basis of recommendation scores or top-N recommended items.

Under user-based collaborative filtering, this memory-based method works under the assumption that users with similar item tastes will rate items similarly. Therefore, the missing ratings for a user can be predicted by finding other similar users (a neighbourhood). Within the neighbourhood, we can aggregate the ratings of these neighbors on items unknown to the user, as basis for a prediction. 

An inverted approach to nearest neighbors based recommendations is item-based collaborative filtering. Instead of finding the most similar users to each individual, an algorithm assesses the similarities between the items that are correlated in their ratings or purchase profile amongst all users.

Some additional starter articles to learning more about collaborative filtering can be found here and here(https://recommender-systems.org/collaborative-filtering/)

How the UBCF algorithm works



Strengths & Weaknesses of Neighborhood Methods

Strengths: simple to implement, and recommendations are easy to explain to user. Transparency about the recommendation to a user can be a great boost to the user's confidence in trusting a rating.

Weaknesses: these algorithms do not too work well on very sparse ratings matrices. Additionally, they are computationally expensive as the entire user database needs to be processed as the basis of forming recommendations. These algorithms will not work from a cold start since a new user has no historic data profile or ratings for the algorithm to start from.

Data Requirements: a user ratings profile, containing items they’ve rated/clicked/purchased. A "rating" can be defined however it fits the business use case.


Content-based filtering (CBF)

The Content-based filtering (CBF) recommenders are broken into three components:

  1. A model class, TFIDFModel.

2. A model provider, TFIDFModelProvider, that computes TF-IDF vectors for items.

3. A scorer/recommender class that uses the precomputed model to score items computing the user-personalized scores for items.

TF-IDF Recommender with Unweighted Profiles

To compute the unit-normalized TF-IDF vector for each item in the data set. The model contains a mapping of item IDs to TF-IDF vectors, normalized to unit vectors, for each item. The heart of the recommendation process is the score method of the item scorer which is TFIDF Item Scorer scoring each item by using cosine similarity and the score for an item is the cosine between that item's tag vector and the user's profile vector. 

Weighted User Profile

In this variant, rather than just summing the vectors for all positively-rated items, a weighted sum of the item vectors is computed for all rated items, with weights being based on the user's rating. 


More Algorithms to Learn

Exercises

As for the practice for this week, you have to build a recommendation system on these Kaggle datasets.

The Movies Dataset

Santander Product Recommendation

要查看或添加评论,请登录

Shivam Panchal的更多文章

  • Best Resources for Data Science Enthusiasts- A Complete List

    Best Resources for Data Science Enthusiasts- A Complete List

    Free Books R Python Libraries Libraries for Python Libraries for R Complete Beginner Resources ML, DL and RL in Python…

  • Machine Learning, Deep Learning and Artificial Intelligence Resources for all

    Machine Learning, Deep Learning and Artificial Intelligence Resources for all

    Here is a bunch of machine learning resources, thought I'd share it here. ★ are resources that were highly recommended…

    1 条评论
  • Machine Learning 9: 'Sequential Rule Mining'

    Machine Learning 9: 'Sequential Rule Mining'

    Sequential Rule Mining is a data mining technique which consists of discovering rules in sequences. Sequential Rule…

    4 条评论
  • Machine Learning 8: 'Clustering Algorithms'

    Machine Learning 8: 'Clustering Algorithms'

    In the last week, we explored classification and Random Forest algorithm and that was a part of Supervised Machine…

    2 条评论
  • Machine Learning 7:'Classification' Day 3

    Machine Learning 7:'Classification' Day 3

    In the last post, I discussed about Decision Tree. In this post, I will be discussing about Random Forest Algorithm…

    9 条评论
  • Machine Learning 6:'Classification' Day 2

    Machine Learning 6:'Classification' Day 2

    Keep asking yes/no questions. With each question continue to significantly narrow down the space of possibly secrets.

    6 条评论
  • Machine Learning : 'Classification' - Day 1

    Machine Learning : 'Classification' - Day 1

    In this post, we are starting off the classification, firstly, we will get into the difference between classification…

    17 条评论
  • Machine Learning : 'Regression' - Day 4

    Machine Learning : 'Regression' - Day 4

    In this post which will be the last one on regression analysis, I will be discussing about the following topics in…

    3 条评论
  • Machine Learning : 'Regression' - Day 3

    Machine Learning : 'Regression' - Day 3

    In the last to last post, we discussed about what is Regression and in the last one, we talked about the assumptions or…

    9 条评论
  • Machine Learning : 'Regression' - Day 2

    Machine Learning : 'Regression' - Day 2

    Welcome to the post, I will not bore you much with the theory behind, I will try to put it as easy as possible. In this…

    3 条评论

社区洞察

其他会员也浏览了