Newsletter: June 2023

Newsletter: June 2023

How to use word embeddings for product recommendations?

What is Word embedding?

Word embedding is the numerical representation of words that is used to enhance search by capturing semantic links between words and improving understanding of textual material. They can be represented as dense vectors in a continuous vector space using a variety of methods, the most common of which being Word2Vec, which is trained on large text corpora which increase the relevance of user search functionality.?

For example in 3 dimensional space we can assign following vectors to given words:

Dimensions: Age, Gender, Size

Cow: 0.7, 1, 0.65?

Calf: 0.35, 0, 0.25

Dog: 1, 1, 0.25?

Puppy: 0.25, 0.125

Computer: 0, 0, 0.15

In real world applications, we use 200-300 dimensions to generate vectors for words.

How to create Embedding?

Let's say we have a table of products with fields: Title, price, avg rating, no of reviews, brands, characters, categories and search_tags.

No alt text provided for this image
Product Information

An aggregation method like averaging or pooling can be used to construct product embeddings.

No alt text provided for this image
Assumed Embedding Vectors

Average the word embeddings in the title, brand, characters, categories, and search tags for each product.

(title_embedding + brand_embedding + character_embedding + 
category_embedding + tag1_embedding + tag2_embedding + tag3_embedding) / 7

= [0.2, 0.4, -0.1, 0.6] + [-0.3, 0.1, 0.5, -0.2] + [0.5, -0.3, 0.2, 0.7] +
 [0.1, -0.2, -0.5, 0.3] + [0.4, -0.1, 0.2, 0.6] + [-0.2, 0.5, 0.1, 0.4] +
 [-0.5, -0.3, -0.2, 0.1] = [0.2, -0.3, 0.2, 1.3 ] / 7 

= [0.0286, -0.0429, 0.0286, 0.5428]?        

The above average eventually produces a single vector representation for “Melissa & Doug Giant Giraffe - Lifelike Stuffed Animal (over 4 feet tall)”.

How to find user’s likes/dislikes through embeddings?

Once we have the embeddings for each product, we can calculate the similarity between user preferences and product embeddings using techniques like cosine similarity.?Let's assume a user likes brand "Melissa & Doug", category "Stuffed Animal" and tag Stuffed Toys. Also, let's assume user dislikes character Spiderman and tag Building Toy.

User's likes embedding = (brand_embedding + category_embedding + 
tag1_embedding) / 3

[-0.3, 0.1, 0.5, -0.2] + [0.1, -0.2, -0.5, 0.3] + [0.4, -0.1, 0.2, 0.6] / 3
= [0.2, -0.2, 0.2, 0.2] /3 

= [0.066, -0.066, 0.066, 0.2]??

User's dislikes embedding: (tag4_embedding + character_embedding) / 2

[-0.3, 0.2, 0.6, -0.4] + [0.5, -0.3, 0.2, 0.7] / 2 
=? [0.2, -0.1, 0.8, 0.3] / 2 

= [0.1, -0.05, 0.4, 0.15]        

Now, we shall calculate the similarity between the user's preferences and the product embeddings using cosine similarity.

No alt text provided for this image
Cosine Similarity formula
Cosine Similarity (Product 1, User's likes)

Lets' assume:

Product vector (A): [0.0286, -0.0429, 0.0286, 0.1857]

Likes vector (B): [0.066, -0.066, 0.066, 0.2]

Dot Product (A . B) = (0.0286 * 0.066) + (-0.0429 * -0.066) 
+ (0.0286 * 0.066) + (0.1857 * 0.2) = 0.043747

Magnitude of Product vector (A) = sqrt((0.0286)^2 + (-0.0429)^2 
+ (0.0286)^2 + (0.1857)^2) = sqrt (0.037961) = 0.194835

Magnitude of likes vector (B) = sqrt((0.066)^2 + (-0.066)^2 
+ (0.066)^2 + (0.2)^2) = sqrt (0.053068) = 0.230364

Cosine Similarity = Dot Product (A . B) / (Magnitude A * Magnitude B)
Cosine Similarity = 0.043734 / (0.194835 * 0.230364)
                  = 0.043734 / 0.044883 = 0.9744        

Therefore, the cosine similarity between the given Product vector and Likes vector is approximately 0.9744.

Cosine Similarity (Product 1, User's dislikes)

Product vector: [0.0286, -0.0429, 0.0286, 0.1857]

Dislikes vector: [0.1, -0.05, 0.4, 0.15]

Similarly, we find the Cosine Similarity as per above steps

Cosine Similarity = = 0.042155 / 0.086037 = 0.489964        

Therefore, the cosine similarity between the given Product vector and Dislikes vector is approximately 0.489964.

Since, Cosine Similarity for Product & Likes (0.9744) is greater than Cosine similarity for Product and Dislikes (0.489964), we will recommend this product to the customer.

要查看或添加评论,请登录

社区洞察