登录查看更多内容

Basic recommendation system based on matrix factorization technique using Keras (Open Source Neural Network library)

Jainendra Kumar, CPM, M.IOD

Member of Forbes Technology Council | Advisor | AI, ML, SaaS, Cloud, DevSecOps | Digital Transformation | Certified Independent Director

发布日期: 2019年9月29日

Recommendation has been used by humans for years as a tool to showcase personalization and empathy. The very first question that we ask an acquaintance – ‘what do you do?’, is an input to personalize our conversation and to give it meaningful context. A neighborhood storekeeper would always recommend a product or two per our taste. This is based on past purchase history, communication, and feedbacks that she receives about us from the neighbors.

Recommendation system has now become a norm in almost all businesses whether it is online or offline. It comes in the form of basic analytics, to a statistical model, to complex deep machine learning-based model. In this blog post, I am addressing a use case of B2B company with products which has repeat uses pattern. The business has products which they sell to customers. As a business owner, we would know some level of correlation between the products that we offer, and this knowledge becomes the basis of our first version of the recommendation engine. A basic algorithm that recommends products based on a product that the user has selected.

As the business grows we start to think of using users past purchase data for product recommendation. How do you do this? One way to analyze historic data and create a statistical model or the other way would be to use machine learning (supervised or unsupervised) technique. In most cases, the statistical model is good enough, but in case we have a lot of data points, we would like to explore deep learning. A basis such technique is deep learning matrix factorization which can very easily be implemented and tested using Keras.

Keras is an Open Source Neural Network library written in Python that runs on top of Theano or Tensorflow and other similar frameworks. It is very simple to use. Generally speaking, any modeling script would have six steps: 1) load data, 2) define Keras model, 3) Compile model, 4) Fit Model, 5) Evaluate Model and 6) make predictions.

Below is a basic recommendation project using Python, pandas, scikit-learn, and Keras.

Data

For simplicity, let us create a dataset of 5 products with a weekly purchase frequency per customer between 1 to 5.

import pandas as pd

from sklearn import preprocessing

from sklearn.preprocessing import LabelEncoder

#Self created data based on hypothesis

import numpy as geek

import numpy as np

# List1

asize = 10000000/2

customer_id = geek.random.randint(low = 0, high = 100, size = 10000000)

# List2

product_class1 = geek.random.randint(low = 0, high = 5, size = int(asize))

product_class2 = geek.random.randint(low = 4, high = 5, size = int(asize))

product_class = np.append(product_class1, product_class2)

# List3

totalcharge_week1 = geek.random.randint(low = 1, high = 5, size = int(asize))

totalcharge_week2 = geek.random.randint(low = 4, high = 5, size = int(asize))

totalcharge_week = np.append(totalcharge_week1, totalcharge_week2)

# get the list of tuples from two lists.

# and merge them by using zip().

list_of_tuples = list(zip(customer_id, product_class, totalcharge_week))

trans_data_created = pd.DataFrame(list_of_tuples, columns = ['customer_id', 'product_class', 'totaltransaction_week'])

users = SPO_trans_data_created.customer_id.unique()

product_class = trans_data_created.product_class.unique()

Recommendation model using matrix factorization technique

Let us create a simple model using Keras. Remember for the algorithm to work we have to convert all feature sets to numeric format.

It is a basic model where we are creating an embedding for users, and one for the products. The dot product between an item and a product is the transaction prediction. When we train the model, the embeddings parameters are learned, which gives us a latent representation.

#keras - Model - Custom model created for matrix factorization

from keras.models import Model

from keras.layers import Input, Embedding

from keras.layers.merge import Dot,

user_id_input = Input(shape=[1], name='user')

item_id_input = Input(shape=[1], name='item')

embedding_size = 30

user_embedding = Embedding(output_dim=embedding_size, input_dim=users.shape[0]

input_length=1, name='user_embedding')(user_id_input)

item_embedding = Embedding(output_dim=embedding_size, input_dim=product_class.shape[0],

input_length=1, name='item_embedding')(item_id_input)

user_vecs = Reshape([embedding_size])(user_embedding)

item_vecs = Reshape([embedding_size])(item_embedding)

user_vecs_dense = Dense(128, activation='relu')(user_vecs)

item_vecs_dense = Dense(128, activation='relu')(item_vecs)

y = Dot(1, normalize=False)([user_vecs_dense, item_vecs_dense])

model = Model(inputs=[user_id_input, item_id_input], outputs=y)

model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])

print(model.summary())

To train the model, we simply need to call the model’s fit method and look at MSE loss, where it stabilizes.

from sklearn.model_selection import train_test_split

train, test = train_test_split(trans_data, test_size=0.2)

history = model.fit([train["customer_id"], train["product_class"]]

, train["totaltransaction_week"]

, batch_size=500, epochs=50

, validation_split=0.2

, callbacks=None

, shuffle=True)

The last step is to predict and check results.

Xnew = [[0.0], [41.0]]

ynew = model.predict(Xnew, batch_size=None, verbose=0, steps=None, callbacks=None)

查看更多评论

要查看或添加评论，请登录

Jainendra Kumar, CPM, M.IOD的更多文章

AI&ML-enabled DecSecQAAIOps

2021年12月16日

AI&ML-enabled DecSecQAAIOps

For the last few weeks, I have been wondering if we are doing QA automation and DevSecOps right? Is it futuristic? Are…

1 条评论
What does it take to get the API as a platform done right?

2021年8月16日

What does it take to get the API as a platform done right?

Platform and APIs are the new way of product design, faster go-to-market, and client-centric meaningful innovation. One…

5 条评论
Data science and machine learning adoption in Fintech

2021年8月1日

Data science and machine learning adoption in Fintech

The ability to analyze and estimate transaction volumes is critical for improving product value for customers. Data…

3 条评论
Can blockchain prepare us for the next Covid wave or next pandemic?

2021年2月28日

Can blockchain prepare us for the next Covid wave or next pandemic?

The second wave of Covid has exposed issues in our healthcare and all associated systems. We have seen people…
Digital transformation, Growth mindset, and Scaled Agile for Organization transformation - Part 1

2020年3月22日

Digital transformation, Growth mindset, and Scaled Agile for Organization transformation - Part 1

Almost every organization is either going through a digital transformation or will get started sooner than later. I…

1 条评论
DevScience and beyond; the next frontier in DevOps

2020年3月13日

DevScience and beyond; the next frontier in DevOps

In my previous few articles, I discussed lean philosophy, dual-track agile, DevOps and related metrics. Many of these…

4 条评论
SAFe 5.0 - Now it makes sense

2020年2月19日

SAFe 5.0 - Now it makes sense

I have published articles discussing lean, design thinking, agile and DevOps practices in the past, as only by putting…
AutoML will democratize machine learning

2019年12月13日

AutoML will democratize machine learning

Like calculator has changed the way we do a simple calculation, #AutoML will democratize machine learning by making it…
Machine learning to improve K-12 content development

2019年9月18日

Machine learning to improve K-12 content development

With my past background in education technology and a father of two school-going children, I am concern about proper…

5 条评论
Basics of data science

2019年8月13日

Basics of data science

Several fields make data science and diverse skills are needed. It also involves several roles – planning, data…

See all articles

Basic recommendation system based on matrix factorization technique using Keras (Open Source Neural Network library)

Jainendra Kumar, CPM, M.IOD

Member of Forbes Technology Council | Advisor | AI, ML, SaaS, Cloud, DevSecOps | Digital Transformation | Certified Independent Director

Jainendra Kumar, CPM, M.IOD的更多文章

社区洞察

其他会员也浏览了

Why AI and ML Experts Can't Afford to Ignore Statistics

The importance of a test set

The Encoder Component of the Transformer Architecture: Source code Demystified

TensorFlow - Aamir?P

Step-by-step guide on how to run a LLM locally

Building an AI-Powered Image Similarity Search: A Step-by-Step Guide

EXPLAINABLE ARTIFICIAL INTELLIGENCE (XAI) - ONE OF THE MAIN CHARACTERISTICS OF PETROLEUM DATA ANALYTICS (PDA); Section -2

From Research To Reality: Deep Learning Methods on Time Series Forecasting on Financial Data

10 Mind-Blowing Ways Math Tricks You Into Thinking AI is Smarter Than You

A simple CNN In TensorFlow: Practical CIFAR-10 Guide

Jainendra Kumar, CPM, M.IOD的更多文章

AI&ML-enabled DecSecQAAIOps

What does it take to get the API as a platform done right?

Data science and machine learning adoption in Fintech

Can blockchain prepare us for the next Covid wave or next pandemic?

Digital transformation, Growth mindset, and Scaled Agile for Organization transformation - Part 1

DevScience and beyond; the next frontier in DevOps

SAFe 5.0 - Now it makes sense

AutoML will democratize machine learning

Machine learning to improve K-12 content development

Basics of data science

社区洞察

其他会员也浏览了

Why AI and ML Experts Can't Afford to Ignore Statistics

The importance of a test set

The Encoder Component of the Transformer Architecture: Source code Demystified

TensorFlow - Aamir?P

Step-by-step guide on how to run a LLM locally

Building an AI-Powered Image Similarity Search: A Step-by-Step Guide

EXPLAINABLE ARTIFICIAL INTELLIGENCE (XAI) - ONE OF THE MAIN CHARACTERISTICS OF PETROLEUM DATA ANALYTICS (PDA); Section -2

From Research To Reality: Deep Learning Methods on Time Series Forecasting on Financial Data

10 Mind-Blowing Ways Math Tricks You Into Thinking AI is Smarter Than You

A simple CNN In TensorFlow: Practical CIFAR-10 Guide