BxD Primer Series: Matrix Factorization Recommendation Models

BxD Primer Series: Matrix Factorization Recommendation Models

Hey there ??

Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Matrix Factorization Recommendation Models. Let’s get started:

The What:

Matrix factorization is a model-based technique used in recommendation systems to uncover latent factors that underlie the observed user-item interactions in a dataset (check our post on building user-item interactions matrix?here).

Factorization process involves breaking down the user-item matrix into two separate matrices: a user matrix and an item matrix. Each row of user matrix represents a user, and each column represents a latent factor. Each row of item matrix represents an item, and each column represents a latent factor. The dot product of user vector and item vector gives an estimate of user's rating for the item.

Matrix factorization models are typically trained on a dataset of user-item interactions, such as ratings or clicks. Goal is to learn the factor matrices that minimize the error between predicted and actual ratings in training data. Once model is trained, it can be used to make personalized recommendations to users by predicting their ratings for items they haven't yet interacted with.

The How:

Below are the general steps involved in training a matrix factorization model:

  1. Data Preparation: This typically involves extracting user-item interactions and any additional information about users and items that may be relevant for recommendation task.
  2. Data Splitting: Split the data into training and testing sets. Training set is used to train the model, while testing set is used to evaluate its performance.
  3. Model Type Selection: Choose a matrix factorization algorithm and a suitable cost function for the task. There are several options to choose from, as described in later section.
  4. Model Training: Train the model using training set. This involves optimizing chosen cost function using an optimization algorithm such as SGD, Conjugate Gradient, Adam, L-BFGS or other as appropriate.
  5. Hyper-parameter Tuning: Hyper-parameters are set before training the model, such as the learning rate, number of factors, and regularization strength. Select the values that result in best performance on validation set.

Note: Measuring recommendation system performance is already covered in?this?post.

Matrix Factorization Algorithms:

We have explained four major matrix factorization methods in this section:

??Singular Value Decomposition (SVD)

In a rating matrix?R?of size?m*n, where?m?is number of users and?n?is number of items, each entry r_ij represents the rating of user i for item j.

SVD decomposes matrix?R?into three matrices: U, Σ, V, such that R=UΣV^T

U represents user factors, V represents item factors, and Σ represents the strengths of factors. Dimension of matrix Σ is a parameter?k?that need to be optimized using a cost function and optimization technique.

Rating r_ij for user?i?and item?j?is predicted as follows:

No alt text provided for this image

Where,

  • u_if is the i’th row of U
  • v_jf is the j’th row of V
  • σ_f?is the f’th diagonal element of Σ
  • Parameter?k?is number of latent factors used in factorization

SVD may not be well-suited for data with missing values or data that is highly sparse. Many times,?k?value it selected empirically based on data but using Mean Squared Error (MSE) as cost function and SGD as optimization technique is a better option.

??Alternating Least Squares (ALS)

ALS factorize a sparse user-item interaction matrix?R?into two lower-dimensional matrices, a user latent feature matrix?U?(mk) and an item latent feature matrix?V?(nk). It does not require any additional optimization technique and uses mean squared error as default cost function.

It works by iteratively solving for either U or V while fixing the other matrix.

Fix?V, solve for?U:

No alt text provided for this image

Where,

  • r_i is the row vector of user?i?in rating matrix R
  • U_{i,:} is the row vector of user?i?in the latent feature matrix U
  • lambda_U is a regularization parameter that controls the complexity of model

Fix?U, solve for?V:

No alt text provided for this image

Where,

  • r_j is the column vector of item j in rating matrix R
  • V_{:,j} is the column vector of item j in latent feature matrix V
  • lambda_V is a regularization parameter that controls the complexity of model

ALS is effective for sparse datasets and is computationally efficient.

Note: If a constraint that all elements of U and V can only be positive or set to zero, then ALS becomes?Non-negative Matrix Factorization (NMF),?which has use cases in models where interpretability of values is important, scenarios such as image processing or text analysis.

??Probabilistic Matrix Factorization (PMF)

In PMF, each entry of user-item interaction matrix is modeled as a Gaussian distribution with a mean and variance that depend on latent factors. In other words, instead of modeling the user-item interaction matrix as a deterministic matrix, PMF models each entry as a probabilistic distribution. This allows PMF to provide a more accurate estimate of the probability that a user will interact with an item they have not yet seen.

For a mathematical understanding of this concept, read our previous edition on Bayes models?here?and watch this video on Coursera?here.

In summary, the steps for PMF are as follows:

  1. Define the likelihood function based on observed ratings.
  2. Define the prior distribution over model parameters.
  3. Estimate the model parameters by maximizing posterior distribution.
  4. Use estimated parameters to predict ratings for missing entries in rating matrix.

??Bayesian Personalized Ranking Matrix Factorization (BPRMF)

BPRMF is recommended for use with binary preference data, such as like/dislike or click/no-click data. It is not suitable for use with explicit rating data, where the goal is to predict a numerical rating score.

Check comprehensive paper explaining the mathematics of BPRMF?here.

In summary, the steps for BPRMF are as follows:

  1. Randomly initialize user and item latent vectors.
  2. For each positive user-item interaction in the training data, sample a negative item uniformly at random from the set of items that the user has not interacted with.
  3. Compute the difference between predicted score of positive item and negative item, where prediction is the dot product of the corresponding latent vectors.
  4. Updates user and item latent vectors using stochastic gradient descent (SGD) to minimize the negative log-likelihood of observed interactions and the sampled negative interactions.
  5. During training, Bayes' rule is used to calculate posterior distribution over the parameters given the observed data. This helps the model handle uncertainty and avoid overfitting.
  6. Make predictions using trained model on new user-item pairs by computing dot product of their latent vectors and using the resulting score for predicted preference or ranking.

The Why:

Reasons to use matrix factorization technique for recommendation models:

  1. Can handle large and sparse datasets efficiently.
  2. Reduces the dimensionality of original dataset by representing it in terms of latent factors, which reduces computational time compared to collaborative filtering.
  3. Unlike simpler methods like collaborative filtering or content-based filtering, matrix factorization can discover under the hood patterns and interactions between users and items that may not be apparent from the raw data.
  4. Since it relies on the latent factors rather than explicit features, it can make accurate predictions for new users or items with very little data.
  5. Side information about users or items, such as demographics or textual descriptions, can be incorporated into the model as additional features.
  6. It can be used as a pre-processing step to extract meaningful features for other algorithms like decision trees or neural networks.

The Why Not:

Reasons to not use matrix factorization:

  1. Matrix factorization is prone to overfitting and under fitting particularly for datasets with small numbers of users or items.
  2. Since it involves latent factors that are not directly related to the original features, it can be difficult to interpret the learned parameters or understand how they relate to user preferences or item characteristics.
  3. Matrix factorization is a model based approach, that means it need to be trained regularly to adjust for changes in user-item interactions.
  4. Matrix factorization is sensitive to hyper-parameter (number of latent factors, regularization parameters) selection.

Time for you to support:

  1. Reply to this article with your question
  2. Forward/Share to a friend who can benefit from this
  3. Chat on Substack with BxD (here)
  4. Engage with BxD on LinkedIN (here)

In next coming posts, we will cover one more recommendation model - Hybrid Recommender Systems.

Post that, we will start with time series models such as ARIMA, Exponential Smoothing (ES), SARIMA, Vector Autoregression (VAR), Prophet, Hidden Markov Models

Let us know your feedback!

Until then,

Have a great time! ??

#businessxdata?#bxd?#matrix #factorization #recommendationsystems #primer

要查看或添加评论,请登录

Mayank K.的更多文章

  • What we look for in new recruits?

    What we look for in new recruits?

    Personalization is the #1 use case of most of AI technology (including Generative AI, Knowledge Graphs…

  • 500+ Enrollments, ?????????? Ratings and a Podcast

    500+ Enrollments, ?????????? Ratings and a Podcast

    We are all in for AI Driven Marketing Personalization. This is the niche where we want to build this business.

  • What you mean 'Build A Business'?

    What you mean 'Build A Business'?

    We are all in for AI Driven Personalization in Business. This is the niche where we want to build this business.

  • Why 'AI-Driven Personalization' niche?

    Why 'AI-Driven Personalization' niche?

    We are all in for AI Driven Personalization in Business. In fact, this is the niche where we want to build this…

  • Entering the next chapter of BxD

    Entering the next chapter of BxD

    We are all in for AI Driven Personalization in Business. And recently we created a course about it.

    1 条评论
  • We are ranking #1

    We are ranking #1

    We are all in for AI Driven Personalization in Business. And recently we created a course about it.

  • My favorites from the new release

    My favorites from the new release

    The Full version of BxD newsletter has a new home. Subscribe on LinkedIn: ?? https://www.

  • Many senior level jobs inside....

    Many senior level jobs inside....

    Hi friend - As you know, we recently completed 100 editions of this newsletter and I was the primary publisher so far…

  • People need more jobs and videos.

    People need more jobs and videos.

    From the 100th edition celebration survey conducted last week- one point is standing out that people need more jobs and…

  • BxD Saturday Letter #202425

    BxD Saturday Letter #202425

    Please take 2 mins to send your feedback. Link: https://forms.

社区洞察

其他会员也浏览了