The Anatomy of Large-Scale Recommender Systems
Modern real-time recommender systems power many of today's most engaging platforms. While TikTok's implementation recently gained attention due to its US operation shutdown, similar architectures drive recommendations across social media, streaming, and e-commerce platforms. Here's what these systems typically look like under the hood.
The defining characteristic of modern recommender systems is their real-time nature. Every user interaction—a scroll, pause, or skip—immediately influences subsequent recommendations. This continuous feedback loop creates systems that adapt to user preferences within single sessions rather than relying on pre-computed, static recommendations. TikTok was probably one of the first online services that nailed the real-time aspect, adjusting the feed quickly based on the real-time context feedback.
Typical Serving Architecture
These systems typically employ a multi-stage serving architecture to handle billions of items and users while maintaining millisecond-level response times at low cost.
System overview of a modern online recommender system
Candidate Retrieval
Two-tower architectures dominate the retrieval phase, with separate neural networks for users and items enabling fast similarity search. This embedding-based approach generates initial candidates by combining real-time signals like trending content and recent uploads. The item embeddings are usually frozen, but the user/context embedding can be adjusted based on real-time feedback. As the illustration above shows, there might be multiple parallel retrieval calls.
领英推荐
Cascade Ranking
Retrieved candidates flow through a cascade of increasingly sophisticated models. Light models filter thousands of candidates first, followed by more complex models for final ranking. This staged approach balances computational efficiency with recommendation quality.
System Evolution
As these systems scale, they often follow a consistent pattern: They start with compute-heavy models during the service bootstrap phase and then transition to lighter, more efficient models trained on rich interaction data. This evolution reflects the fundamental trade-off between compute & storage costs and recommendation quality at scale. It comes down to $ per user versus the ad revenue per user.
Traditional caching provides minimal benefit in these systems because user preferences are unique per user. These systems typically produce recommendations per user view, which can drive significant traffic compared to search systems where users must type a query.
IMHO: TikTok's key innovation wasn't just sophisticated models (THE ALGO)—it was building a recommender system that learns from every interaction in real-time. Unlike traditional batch-oriented systems, TikTok's algorithm instantly adapts to each pause, swipe, and skip, creating an addictively responsive experience. This real-time learning approach transformed what users expect from social platforms and set a new standard for recommendation systems.
If you want to dive more into the details of TikTok, ByteDance, the company behind the service, published this paper in 2022:?Monolith: Real-Time Recommendation System With Collisionless Embedding Table.
Principal Software Engineer @ Vespa.ai
1 个月This illustration seems to be missing, I found it in your twitter thread though :)