登录查看更多内容

Building the World’s Greatest Recommender System Part 21: Caching to Avoid Repeated Work

Luke Zhuo

Software Engineer, ML at Meta

发布日期: 2024年6月30日

With every use of a 12-trillion-parameter deep learning recommendation model (DLRM), to match users with recommended content, we ask ourselves, “how can we avoid doing this again?” If a user just left their feed to check an email, when they come back, do we really need to go through the entire multi-stage ranking process, making requests to multiple machine learning models such as Two Towers Neural Network retrieval model and a Multi-Task Multi-Label ranking model, all over again?

The short answer is “No!” Now, let’s understand how we avoid this repeated work through caching. Caching is a well-established technique to improve system performance by storing and reusing previously computed results.?

Caching to Avoid Re-computation a User Reloads a Feed

In recommendation systems, a standard cache would save an ordered list of recommended items for a user. When the user returns, the system can use these cached recommendations instead of generating an entire new ordered list of recommendations, thus reducing computational load and latency. Under the hood, this standard approach to caching requires management of cache invalidation when cache data has gotten too old (stale). It also requires management of cache consistency, in which the nodes of a distributed cache need to be updated to all reflect the agreed-upon data in the “source of truth,” usually a database. While perhaps intriguing to some, the challenges of distributed caching are not unique to real-time machine learning systems and are generally abstracted (hidden) away from machine learning by most common caches (i.e. Redis or Meta’s TAO).?

Ensuring Distributed Caching reflects the "Source of Truth"

However, for recommender systems, traditional caching has its drawbacks. The primary issue is the staleness of cached data, which can hurt user engagement. When cached recommendations become outdated, they may no longer align with the user's current interests, leading to decreased user satisfaction. Case in point: a user may have been happy with Halloween ads before and during Halloween, but if a cache serves them Halloween ads after Halloween ends, they may be very dissatisfied.

Halloween Apparel is Significantly More Valuable Before the Holiday than After

To address this issue, we can take advantage of a smart caching system. A smart cache would not only store items but also their ranking scores in a storage system and utilize a lightweight adjuster model that refreshes cached ranking scores before they are used, ensuring that recommendations remain relevant.

To delve into the adjuster model, it would predict the fresh ranking score for a cached item based on the stale score, the time gap since the score was cached, and standard model features. This model would need a significantly lighter and faster design than the full ranking model, perhaps using gradient boosted regression trees (or potentially a lightweight neural network), allowing for substantial reductions in computational cost and latency without sacrificing recommendation quality.

For the adjuster model used in the smart cache, we could also utilize the full model to help us update cached ranking scores, keeping them relevant. We could train the adjuster model with the help of knowledge distillation. Knowledge distillation is the process of transferring knowledge from a large model to a smaller, more efficient one. This technique promises the best of both worlds - maintaining high accuracy while reducing compute cost, and therefore latency, as we would not want to wait for another heavyweight model to run on the cache contents. We should note that the smaller model, used at inference time, will not be as “expressive,” or capable, after any amount of distillation as the larger model. Nevertheless, by periodically sending a subset of cached items to the primary model for fresh scoring, the system could collect data to help train the adjuster to have greater accuracy

领英推荐

Everything About Azure ML Service- A Must Knowledge -…

Naresh i Technologies 2 年前

Comparing RAG Chatbot implementations with Databricks,…

ELCA Group 4 个月前

How Azure and OpenAI Are Revolutionizing Businesses

CloudScaler Technologies B.V. 1 年前

.Smart caching not only addresses immediate performance needs but also offers flexibility to adapt to varying system conditions. By tuning parameters such as the maximum time of validity for cached data, the system can dynamically balance capacity and latency with accuracy and therefore, user satisfaction. Thus, an innovative caching approach such as smart caching represents a significant advancement in the optimization of recommendation systems. By integrating a smart caching system with an adjuster model, we can utilize caching for immediate compute and latency improvements which may also pave the way for future integration of more sophisticated models and features with latency constraints lifted.

If you benefited from this post, please share so it can help others.

Sources (All Content Can Be Found In Publicly Available Information; No Secrets):

https://arxiv.org/abs/2108.09373?

https://scontent-sjc3-1.xx.fbcdn.net/v/t39.8562-6/246840540_415911046564928_1919722069623128899_n.pdf?_nc_cat=100&ccb=1-7&_nc_sid=e280be&_nc_ohc=Ehs4G_VuONUQ7kNvgGnkr6R&_nc_ht=scontent-sjc3-1.xx&oh=00_AYCdXgVDbEe84CyPfXjs-veGmwETAYKbZwxG_s3CfE-qog&oe=66750D76?

https://www.tecton.ai/blog/high-scale-feature-serving-at-low-cost-with-caching-machine-learning/?

https://engineering.fb.com/2022/06/08/core-infra/cache-made-consistent/?

https://unsplash.com/photos/a-man-sitting-at-a-desk-with-his-head-in-his-hands-fru_EXsqsp4 (Vitaly Gariev image)

https://www.viamarkvideo.com/scroll-scroll-thumb-stop/ (image)

https://neptune.ai/blog/knowledge-distillation (image)

https://kitchentablestamper.com/2016/10/amuse-halloween-clearance-sale-shop-now/ (image)

要查看或添加评论，请登录

Luke Zhuo的更多文章

Building the World’s Greatest Recommender System Part 20: Pacing Ourselves

2024年6月22日

Building the World’s Greatest Recommender System Part 20: Pacing Ourselves

Imagine a race with a finish line unknown to the runners. In their efforts to win such a race, competitive runners…
(Building the World’s Greatest Recommender System Part 19) Doing More with More: Feature Transforms (Part 2)

2024年6月16日

(Building the World’s Greatest Recommender System Part 19) Doing More with More: Feature Transforms (Part 2)

In this blog post, we continue answering the question of “What really goes into a machine learning model?” We will…
(Building the World’s Greatest Recommender System Part 18) Doing More with the Same: Feature Transforms

2024年6月9日

(Building the World’s Greatest Recommender System Part 18) Doing More with the Same: Feature Transforms

What really goes into a machine learning model? Machine learning models can be thought about as functions: input data…
Less is More: The Power of Quantization for Machine Learning Models

2024年6月1日

Less is More: The Power of Quantization for Machine Learning Models

Every day, ever-larger machine learning models train on ever-larger, ever-costlier GPU clusters. For example, training…
Building the World’s Greatest Recommender System Part 16: Synchronizing Distributed Model Training

2024年5月26日

Building the World’s Greatest Recommender System Part 16: Synchronizing Distributed Model Training

Synchronization between GPU’s to enabled balanced training speed, network consistency, straggler mitigation Be it for…
Building The World’s Greatest Recommender System Part 15: Distributed Data for Model Training

2024年5月19日

Building The World’s Greatest Recommender System Part 15: Distributed Data for Model Training

As machine learning progresses, models only seem to ingest more data. Recently, Meta advertised that Llama-3, the…
Building The World’s Greatest Recommender System Part 14: Training Distributed Machine Learning Models

2024年5月11日

Building The World’s Greatest Recommender System Part 14: Training Distributed Machine Learning Models

Machine learning has seemingly adopted the attitude that “bigger is better”. For example, Large Language Models (LLMs),…
Building The World’s Greatest Recommender System Part 13: How Models Learn (In Theory)

2024年5月5日

Building The World’s Greatest Recommender System Part 13: How Models Learn (In Theory)

Distributed machine learning model training has become a hot topic as of late, as it is pivotal to recommender systems,…
Building the World’s Greatest Recommender System Part 12: Communicating Effectively (Among Deep Learning Servers)

2024年4月28日

Building the World’s Greatest Recommender System Part 12: Communicating Effectively (Among Deep Learning Servers)

It is a simple fact that we cannot hope to fit the processing jobs needed to extract the data to predict interactions…
Building the World’s Greatest Recommender System Part 11: What is Accuracy?

2024年4月20日

Building the World’s Greatest Recommender System Part 11: What is Accuracy?

To build a strong recommender system, we need to be able to quantify accuracy in a way that it is aligned with users…

See all articles

Building the World’s Greatest Recommender System Part 21: Caching to Avoid Repeated Work

Luke Zhuo

Software Engineer, ML at Meta

领英推荐

Luke Zhuo的更多文章

社区洞察

其他会员也浏览了

AWS GenAI Competency Partnership, Beginner Guides, and Lots of AI events!

Maximizing ROI with Azure OpenAI: A Comprehensive Guide

Athen Tech Bytes – July Edition II

AI <Connect> Newsletter | Edition #8

Unleashing the Power of AWS SageMaker: A Comprehensive Guide

Synergizing Amazon Bedrock with AWS Building Next-Gen AI Applications

Amazon AI Fairness and Explainability with Amazon SageMaker Clarify

Azure OpenAI with Azure API Management

DataPanthy #95

How to Identify the Best Machine Learning Tools for Your Needs – The Ultimate Guide

领英推荐

Luke Zhuo的更多文章

Building the World’s Greatest Recommender System Part 20: Pacing Ourselves

(Building the World’s Greatest Recommender System Part 19) Doing More with More: Feature Transforms (Part 2)

(Building the World’s Greatest Recommender System Part 18) Doing More with the Same: Feature Transforms

Less is More: The Power of Quantization for Machine Learning Models

Building the World’s Greatest Recommender System Part 16: Synchronizing Distributed Model Training

Building The World’s Greatest Recommender System Part 15: Distributed Data for Model Training

Building The World’s Greatest Recommender System Part 14: Training Distributed Machine Learning Models

Building The World’s Greatest Recommender System Part 13: How Models Learn (In Theory)

Building the World’s Greatest Recommender System Part 12: Communicating Effectively (Among Deep Learning Servers)

Building the World’s Greatest Recommender System Part 11: What is Accuracy?

社区洞察

其他会员也浏览了

AWS GenAI Competency Partnership, Beginner Guides, and Lots of AI events!

Maximizing ROI with Azure OpenAI: A Comprehensive Guide

Athen Tech Bytes – July Edition II

AI <Connect> Newsletter | Edition #8

Unleashing the Power of AWS SageMaker: A Comprehensive Guide

Synergizing Amazon Bedrock with AWS Building Next-Gen AI Applications

Amazon AI Fairness and Explainability with Amazon SageMaker Clarify

Azure OpenAI with Azure API Management

DataPanthy #95

How to Identify the Best Machine Learning Tools for Your Needs – The Ultimate Guide