登录查看更多内容

Modern Visual RecSys: How to Design a Recommender?

Kai Xin Thia

VP, Head of AI & Data Analytics at ST Engineering

发布日期: 2020年3月15日

For this chapter, I will introduce the RecSys Design Framework with a case study of Amazon.

This is part of my Modern Visual RecSys series; feel free to check out the rest of the series at the end of the article.

RecSys Framework — Amazon case study

Recommendations on my Amazon homepage

An eCommerce website like Amazon is heavily reliant on having a good RecSys. After all, users cannot be expected to browse through millions of products that are on the platform, while sellers will like exposure for their products. Furthermore, there is limited space on the website/app; a good RecSys should be able to match the user’s preferences with the most relevant product and display the results in a certain order that encourages the user to click or purchase the item.

How do we go about designing a RecSys from scratch? Let me share with you a framework I use.

Step 1: Define The business case

There is no clear “correct answer” for RecSys. From the Amazon recommendations above, how do you tell if the RecSys is doing a great job? If I bought socks and monitors in the past, does it still make sense to recommend me more socks and monitors? RecSys is all about improving user experience and business KPIs. It is thus vital to understand the metrics involve:

RecSys metrics:

“How well it matches historical trends”: accuracy / relevancy / coverage
“How diverse are the recommendations”: diversity / serendipity / novelty
Respecting user Privacy
Cold start problem (the challenge of recommending to new users)

Business metrics:

Clickthrough rate (CTR)
Purchase rate
Introduce new product / seller

While accuracy/relevancy gives as a ballpark figure to understand how well our RecSys matches up with historical trends, they do not show the full picture.

Users cannot judge items they have not yet seen. In essence, a RecSys can perform poorly on historical measures while introducing lots of new products to users and increase the overall purchase rate. It is crucial to have a mix of historical, diversity, and business metrics measured through AB testing with different demographics of users.

Step 2: Prepare the Data

Let us take a look at the patent filed by Amazon in 1998 (US 6,266,649 B1: Collaborative recommendations using item-to-item similarity mappings) that contains the foundation of RecSys, relevant even till today.

Source: Patent US 6,266,649 B1: Collaborative recommendations using item-to-item similarity mappings by Amazon

From the patent image above, we can outline the various pieces of data that are found in all modern RecSys:

User interactions (shown as web server — html in image). These are usually clickstream data that are stored as log files and accessed via a publish-subscribe messaging system like Kafka. A common technique to track users is browser fingerprinting, which comes with issues of user privacy.
User profiles (with purchase history, ratings, shopping cart, wishlist, comments, recently browse, etc.). These are usually transactional and structured data stored in most customer databases. A baseline data for persona creation, segmentation, targeting, retention and reactivation. A lot of work is in user fraud as well.
Seller profiles (ratings, sales, price range, promotions, partnerships, novelty, catalog, etc.). These are missing in the image above but are essential in any modern RecSys design. Sellers are the platform’s lifeblood and content partner. The data will be critical in product promotions, launching new products, and ranking/placement of products on the site.
Item attributes (price history, category, sales, quantity in stock, consolidate across multiple-sellers, images, fake product check, etc.). These data are vital to the item similarity models. Product image similarity will be a core part of this series of articles.

Use data in combination — the power of the data and insights grow exponentially when you combine them. The image outline three key relationships underlying most RecSys: popular item + similar item, customer + item, market basket analysis.

Step 3: Design a suitable architecture:

Source: System Architectures for Personalization and Recommendation by Xavier Amatriain and Justin Basilico

The system architecture of RecSys is usually a guarded secret and differs based on the scale and requirements. One good baseline architecture is the one from Netflix (see the link under further readings for more details). Netflix hosts their RecSys on AWS and divides the system into three time horizons:

Offline (batch processing), these are models that are computationally heavy and takes a long time to run. Models that need to generate long term relationship pairs across all users and products, such as collaborative filtering models will fall under this category. Usually, we will run these models at scheduled intervals throughout the day, every few hours. These are often the most complex and accurate models.
Nearline (semi real-time / micro-batch processing). Modern frameworks like Spark allows for processing of data every 5–10s, and this is great for modeling short term behavioral patterns, such as browsing patterns. We can update the recommendations based on what the user is looking at and recommend complementary items for what they added to the cart.
Online (real-time) models can be costly and unnecessary for most use cases that can be handled in semi real-time. Because real-time processing occurs in the ballpark of ~10ms, little modeling can be completed at this speed. Most likely, we will be pre-generating the results (such as the most popular items) and serving results via an efficient data structure such as Redis (in-memory database) or cache. Real time testing in the form of AB testing or multi armed bandits are also critical for the success of any RecSys.

Designing a good RecSys architecture takes experience, time, and an understanding of stakeholder requirements (metrics, data, budget, time, etc.).

The key is to start small with offline models, scale towards semi real-time models, and always be testing in real-time.

What have we learned

It is not easy to design a RecSys. We should always start with the business problem. Do we have a current baseline solution? What are the expectations/goals? How much resources are we willing to put into this project? Next, we should evaluate our data sources. Are we tracking the user interactions? Do we have access to the user and item data? How much historical data do we have to work with? The answer to all these questions will determine the kinds of architecture we design to tackle the requirements.

Reflections

Imagine your RecSys team consists of business stakeholders, designers, front end & back end engineers, data engineers, and fellow data scientists. What questions will you ask the various team members?
Below, we have the patent filings from Amazon regarding how they generate shopping cart and instant recommendations. Do the workflows make sense? Are there any clarifications that you will like from the data scientists that designed the workflow? How will you improve the workflow with modern tools/processes?

Source: Patent US 6,266,649 B1: Collaborative recommendations using item-to-item similarity mappings by Amazon

Explore the rest of Modern Visual RecSys Series

How does a Recommender Work? [Foundational][we are here]
How to Design a Recommender? [Foundational][we are here]
Intro to Visual RecSys [Core]
Convolutional Neural Networks Recommender [Pro]
COVID-19 Case Study with CNN [Pro]
Building a Personalized Real-Time Fashion Collection Recommender [Pro]
Temporal Modeling [Pro]
The Future of Visual Recommender Systems: Four Practical State-Of-The-Art Techniques [Foundational]

Series labels:

Foundational: general knowledge and theories, minimum coding experience needed.
Core: more challenging materials with code.
Pro: Difficult materials and code, with production-grade tools.

Modern Visual RecSys: How to Design a Recommender?

Kai Xin Thia

VP, Head of AI & Data Analytics at ST Engineering

RecSys Framework — Amazon case study

What have we learned

Reflections

Explore the rest of Modern Visual RecSys Series

Further Readings

更多精彩文章

社区洞察

其他会员也浏览了

Adobe's Firefly Pays Out Rewards, Chris Do's AI Clone, & Top 50 AI Apps Unveiled!

The anatomy of high-performance recommender systems - Part 1: Introduction to recommender systems

In-depth read: How do recommender systems work?

Unbiased recommendation in e-commerce

What does the Future of Personalisation Look like today?

Web Development & Trends

Embracing AI in Web Development: A Game-Changer for the Digital Age

Personalized recommendations - III (user feedback)

In-depth read: How do recommender systems work?

Step-by-Step Guide to Fine-Tune Flux.1 with AI Toolkit and Generate Images for E-commerce

RecSys Framework — Amazon case study

What have we learned

Reflections

Explore the rest of Modern Visual RecSys Series

Further Readings

The Significance of LLMs in Healthcare

2024年11月20日

Beyond Simple Retrieval: AI Agents as Learners

2024年11月13日

Exploring Human Behaviour with LLM-Powered Agents

2024年11月7日

Urban Computing AI - POI Recommendation

2024年10月30日

Robotic AI Agents

2024年10月25日

The Future of Visual Recommender Systems: Four Practical State-Of-The-Art Techniques

2020年4月10日

Temporal Fashion Recommender

2020年4月5日

Building a Personalized Real-Time Fashion Collection Recommender

2020年3月28日

Modern Visual RecSys: COVID-19 Case Study with CNN

2020年3月21日

Modern Visual RecSys: Intro to Visual?RecSys

2020年3月15日

社区洞察

其他会员也浏览了

Adobe's Firefly Pays Out Rewards, Chris Do's AI Clone, & Top 50 AI Apps Unveiled!

The anatomy of high-performance recommender systems - Part 1: Introduction to recommender systems

In-depth read: How do recommender systems work?

Unbiased recommendation in e-commerce

What does the Future of Personalisation Look like today?

Web Development & Trends

Embracing AI in Web Development: A Game-Changer for the Digital Age

Personalized recommendations - III (user feedback)

In-depth read: How do recommender systems work?

Step-by-Step Guide to Fine-Tune Flux.1 with AI Toolkit and Generate Images for E-commerce