BxD Primer Series: Matrix Factorization Recommendation Models
Hey there ??
Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Matrix Factorization Recommendation Models. Let’s get started:
The What:
Matrix factorization is a model-based technique used in recommendation systems to uncover latent factors that underlie the observed user-item interactions in a dataset (check our post on building user-item interactions matrix?here).
Factorization process involves breaking down the user-item matrix into two separate matrices: a user matrix and an item matrix. Each row of user matrix represents a user, and each column represents a latent factor. Each row of item matrix represents an item, and each column represents a latent factor. The dot product of user vector and item vector gives an estimate of user's rating for the item.
Matrix factorization models are typically trained on a dataset of user-item interactions, such as ratings or clicks. Goal is to learn the factor matrices that minimize the error between predicted and actual ratings in training data. Once model is trained, it can be used to make personalized recommendations to users by predicting their ratings for items they haven't yet interacted with.
The How:
Below are the general steps involved in training a matrix factorization model:
Note: Measuring recommendation system performance is already covered in?this?post.
Matrix Factorization Algorithms:
We have explained four major matrix factorization methods in this section:
??Singular Value Decomposition (SVD)
In a rating matrix?R?of size?m*n, where?m?is number of users and?n?is number of items, each entry r_ij represents the rating of user i for item j.
SVD decomposes matrix?R?into three matrices: U, Σ, V, such that R=UΣV^T
U represents user factors, V represents item factors, and Σ represents the strengths of factors. Dimension of matrix Σ is a parameter?k?that need to be optimized using a cost function and optimization technique.
Rating r_ij for user?i?and item?j?is predicted as follows:
Where,
SVD may not be well-suited for data with missing values or data that is highly sparse. Many times,?k?value it selected empirically based on data but using Mean Squared Error (MSE) as cost function and SGD as optimization technique is a better option.
??Alternating Least Squares (ALS)
ALS factorize a sparse user-item interaction matrix?R?into two lower-dimensional matrices, a user latent feature matrix?U?(mk) and an item latent feature matrix?V?(nk). It does not require any additional optimization technique and uses mean squared error as default cost function.
It works by iteratively solving for either U or V while fixing the other matrix.
Fix?V, solve for?U:
Where,
Fix?U, solve for?V:
领英推荐
Where,
ALS is effective for sparse datasets and is computationally efficient.
Note: If a constraint that all elements of U and V can only be positive or set to zero, then ALS becomes?Non-negative Matrix Factorization (NMF),?which has use cases in models where interpretability of values is important, scenarios such as image processing or text analysis.
??Probabilistic Matrix Factorization (PMF)
In PMF, each entry of user-item interaction matrix is modeled as a Gaussian distribution with a mean and variance that depend on latent factors. In other words, instead of modeling the user-item interaction matrix as a deterministic matrix, PMF models each entry as a probabilistic distribution. This allows PMF to provide a more accurate estimate of the probability that a user will interact with an item they have not yet seen.
For a mathematical understanding of this concept, read our previous edition on Bayes models?here?and watch this video on Coursera?here.
In summary, the steps for PMF are as follows:
??Bayesian Personalized Ranking Matrix Factorization (BPRMF)
BPRMF is recommended for use with binary preference data, such as like/dislike or click/no-click data. It is not suitable for use with explicit rating data, where the goal is to predict a numerical rating score.
Check comprehensive paper explaining the mathematics of BPRMF?here.
In summary, the steps for BPRMF are as follows:
The Why:
Reasons to use matrix factorization technique for recommendation models:
The Why Not:
Reasons to not use matrix factorization:
Time for you to support:
In next coming posts, we will cover one more recommendation model - Hybrid Recommender Systems.
Post that, we will start with time series models such as ARIMA, Exponential Smoothing (ES), SARIMA, Vector Autoregression (VAR), Prophet, Hidden Markov Models
Let us know your feedback!
Until then,
Have a great time! ??