#25 - How does Reddit’s Personalization Model work
Originally published in The Times of India.
In the previous post, we discussed how X or Twitter’s Personalization Model works. This post will attempt to juxtapose Reddit’s personalization model against X’s so that we develop a coherent understanding of personalization models.?
How X & Reddit are similar: X and Reddit both thrive on user-generated, community-driven content and engagement algorithms, enabling real-time discussions and content virality. In fact, Reddit is the only other social network that Apple’s App Store also classifies as a news product.
How X & Reddit are different: Reddit is a network of 100k subreddits centered on shared interests, while X emphasizes individual user accounts providing brief, real-time updates.
Historically, personalization on Reddit:?
In July 2021, Reddit introduced a personalized feed: Instead of recommending subreddits, they started recommending posts directly in the user’s feed.
How it works
With this context out of the way, let’s get into how they’ve built it by placing Reddit’s model in the six stages that we introduced in the previous blog — Twitter’s Personalization Model:
1. Selection from the Corpus
The system starts with all Reddit submissions from the past 24 hours.
2. Candidate Generation
It then uses machine learning to identify posts from subreddits you’ve joined, subreddits similar to those you’ve joined, or subreddits you’ve visited recently. For diversity, it also recommends posts from subreddits that are popular or geographically popular.
领英推荐
3. Filtering
They remove posts that are:
4. Scoring
A ML model assigns a weighted-score to each of the remaining posts by probability of click (CTR), propensity of joining (or leaving) the subreddit, propensity of commenting or upvoting/downvoting the post and watch probability if the post has a video.
Below are some interesting quotes from Reddit blogs:
Multi-task models have become particularly important at Reddit. Users engage with content in many ways, with many content types, and their engagement tells us what content and communities they value.
This type of training also implicitly captures negative feedback - content the user chose not to engage with, downvotes, or communities they unsubscribe from.?
These probabilities can be used to estimate long term measures such as retention.
5. Re-ranking
At this point, Reddit doesn’t blindly always put the posts with the highest score at the top. Instead, they use sampling to inject:
The feed is curated to avoid showing too many similar posts in a row. Even if several posts have high scores, they might be spaced apart to enhance variety. Posts from different subreddits, topics, and formats (e.g., text, video, link) are interspersed to keep the feed engaging.
Conclusion
I will continue reviewing additional product literature on personalization models employed across various media products, but it is likely that the six stages mentioned above will remain applicable.
- - -
Curious how I’m managing to write? I created a CustomGPT for myself, which serves as my go-to editor and audits my first draft. Here’s the link—give it a spin! It’s free to use. https://chatgpt.com/g/g-hgI62sWPm-mediaflywheels-review-opinion-pieces
Want to republish it? This post was released under CC BY-ND — you can republish it as is with the following credit and backlinks: ‘Originally published by Ritvvij Parrikh on The Times of India. The author retains the copyright and any other ancillary rights to the post.