Basic recommendation system based on matrix factorization technique using Keras (Open Source Neural Network library)

Basic recommendation system based on matrix factorization technique using Keras (Open Source Neural Network library)

Recommendation has been used by humans for years as a tool to showcase personalization and empathy. The very first question that we ask an acquaintance – ‘what do you do?’, is an input to personalize our conversation and to give it meaningful context. A neighborhood storekeeper would always recommend a product or two per our taste. This is based on past purchase history, communication, and feedbacks that she receives about us from the neighbors.

Recommendation system has now become a norm in almost all businesses whether it is online or offline. It comes in the form of basic analytics, to a statistical model, to complex deep machine learning-based model. In this blog post, I am addressing a use case of B2B company with products which has repeat uses pattern. The business has products which they sell to customers. As a business owner, we would know some level of correlation between the products that we offer, and this knowledge becomes the basis of our first version of the recommendation engine. A basic algorithm that recommends products based on a product that the user has selected.

As the business grows we start to think of using users past purchase data for product recommendation. How do you do this? One way to analyze historic data and create a statistical model or the other way would be to use machine learning (supervised or unsupervised) technique. In most cases, the statistical model is good enough, but in case we have a lot of data points, we would like to explore deep learning. A basis such technique is deep learning matrix factorization which can very easily be implemented and tested using Keras.

Keras is an Open Source Neural Network library written in Python that runs on top of Theano or Tensorflow and other similar frameworks. It is very simple to use. Generally speaking, any modeling script would have six steps: 1) load data, 2) define Keras model, 3) Compile model, 4) Fit Model, 5) Evaluate Model and 6) make predictions.  

Below is a basic recommendation project using Python, pandas, scikit-learn, and Keras.

Data

For simplicity, let us create a dataset of 5 products with a weekly purchase frequency per customer between 1 to 5.

import pandas as pd

from sklearn import preprocessing

from sklearn.preprocessing import LabelEncoder

 #Self created data based on hypothesis 

import numpy as geek

import numpy as np

 # List1 

asize = 10000000/2

customer_id = geek.random.randint(low = 0, high = 100, size = 10000000) 

 # List2

product_class1 = geek.random.randint(low = 0, high = 5, size = int(asize)) 

product_class2 = geek.random.randint(low = 4, high = 5, size = int(asize)) 

product_class = np.append(product_class1, product_class2)

 # List3 

totalcharge_week1 = geek.random.randint(low = 1, high = 5, size = int(asize)) 

totalcharge_week2 = geek.random.randint(low = 4, high = 5, size = int(asize))

totalcharge_week = np.append(totalcharge_week1, totalcharge_week2)

 # get the list of tuples from two lists. 

# and merge them by using zip(). 

list_of_tuples = list(zip(customer_id, product_class, totalcharge_week)) 

 trans_data_created = pd.DataFrame(list_of_tuples, columns = ['customer_id', 'product_class', 'totaltransaction_week'])

 users = SPO_trans_data_created.customer_id.unique()

product_class = trans_data_created.product_class.unique()

Recommendation model using matrix factorization technique

 Let us create a simple model using Keras. Remember for the algorithm to work we have to convert all feature sets to numeric format.

 It is a basic model where we are creating an embedding for users, and one for the products. The dot product between an item and a product is the transaction prediction. When we train the model, the embeddings parameters are learned, which gives us a latent representation.

#keras - Model - Custom model created for matrix factorization

from keras.models import Model

from keras.layers import Input, Embedding

from keras.layers.merge import Dot, 

 user_id_input = Input(shape=[1], name='user')

item_id_input = Input(shape=[1], name='item')

 embedding_size = 30

user_embedding = Embedding(output_dim=embedding_size, input_dim=users.shape[0]

input_length=1, name='user_embedding')(user_id_input)

item_embedding = Embedding(output_dim=embedding_size, input_dim=product_class.shape[0],

 input_length=1, name='item_embedding')(item_id_input)

user_vecs = Reshape([embedding_size])(user_embedding)

item_vecs = Reshape([embedding_size])(item_embedding)

user_vecs_dense = Dense(128, activation='relu')(user_vecs)

item_vecs_dense = Dense(128, activation='relu')(item_vecs)

y = Dot(1, normalize=False)([user_vecs_dense, item_vecs_dense])

 model = Model(inputs=[user_id_input, item_id_input], outputs=y)

model.compile(optimizer='adam', loss='mse', metrics=['accuracy']) 

 print(model.summary())

 To train the model, we simply need to call the model’s fit method and look at MSE loss, where it stabilizes.

from sklearn.model_selection import train_test_split

train, test = train_test_split(trans_data, test_size=0.2)

 history = model.fit([train["customer_id"], train["product_class"]]

          , train["totaltransaction_week"]

          , batch_size=500, epochs=50

          , validation_split=0.2

          , callbacks=None

          , shuffle=True)

 The last step is to predict and check results.

Xnew = [[0.0], [41.0]]

ynew = model.predict(Xnew, batch_size=None, verbose=0, steps=None, callbacks=None)

 

要查看或添加评论,请登录

Jainendra Kumar, CPM, M.IOD的更多文章

社区洞察

其他会员也浏览了