登录查看更多内容

#23 - How does X or Twitter's Personalization Model work

Ritvvij Parrikh

Building Stuff

发布日期: 2024年12月24日

Originally published in The Times of India.

Every major Big Tech product you use is powered by a recommender model. This blog series will help you build a clear understanding of how recommender models work. In the first blog, we’ll explore X’s recommender model.

X is a News Product

This blog explains what happens behind the scenes in X’s “For You” feed, Search, Explore (Trending), and Ads. Both X (formerly known as Twitter) and news products address a similar user need: helping users stay informed about what’s happening around the world—essentially, news.

Apple App Store knows this and hence it classifies X as a news product.

Elon knows this and hence he is doubling down on this positioning:

In 2023, Elon Musk open-sourced X’s recommender model.?

Here’s X’s architecture diagram that Elon Musk had shared:

Recommender architectures progressively reduce a huge corpus (millions/billions of items) into a relevant set of recommendations that the user can act upon: It can have 6 distinct stages:

Selecting from the Corpus
Candidate Generation
Filtering
Scoring or Ranking
Re-Ranking or Mixing
Serving

In most literature on recommenders, step 1 and step 2 are combined into one step. However, I am choosing to separate it simply because interesting choices can be made at the query layer too.

Step 1: Selecting from the Corpus

Every media product generates a vast amount of content daily. For X, this amounts to approximately 500 million tweets per day. However, content from previous days or weeks can still be relevant. Processing all of this content would be computationally expensive, so recommender systems start by using a basic SQL filter to narrow down the archive into a manageable list.

Step 2: Candidate Generation

The next step needs to be computational efficient to quickly narrow down the selection list (millions or billions of items) to a small subset of content that is both consumable and relevant to the user. There can be multiple candidate generator models, each producing a separate list. At this stage, the model doesn’t precisely rank content but filters it down to a quantity the user is likely to consume.

How X approaches this

For X, users typically consume around 1,500 tweets in a session.

X recognizes the importance of avoiding echo chambers or rabbit holes by not serving users only what they are interested in. Additionally, users don’t come to X to dive deeply into a single topic—they typically want a mix of content they enjoy alongside other engaging or relevant topics. To achieve this balance, X’s model retrieves 50% of the 1,500 tweets from within your network (people you follow) and 50% from outside your network.

Part 1 - Here’s how X finds 750 tweets from within a user’s network:

X uses a search engine to identify tweets posted by the people you follow using the following criteria:

Engagement likelihood: A graph-based model predicts how likely you are to interact with the tweet’s author.
Author’s influence
Trust and safety: A filter removes tweets deemed NSFW, toxic, or abusive.

Over the years, X’s search engine now can find key ideas without the use of hashtags.?

Part 2 - Here’s how X finds 750 tweets from outside a user’s network:

This part leverages and strengthens X’s network effects by recommending tweets from outside of the people you follow.?

2nd Degree Connections or Friends-Follow (FoF): Much like collaborative filtering, X recommends tweets from people with similar interests or from people whom those you follow engage with, following the principle of “a friend’s friend is a friend.”
Communities: Beyond FOF, X cannot find tweets from all of its daily active users (hundreds of millions). Hence, X clusters its entire DAU into approximately 100k communities using user/tweet embeddings. Then it finds the communities you are part of and recommends tweets that are either trending within the community or from influencers within the community.

One downside of the Communities feature is that it helps big accounts grow more. Elon recognized this and probably rolled out a way to fix this.

领英推荐

Nearly All of OpenAI Staff Threaten to Go to Microsoft…

Bloomberg News 1 年前

Betting on Sam Altman

The Investor's Podcast Network 1 年前

Gmail’s Gemini, Microsoft’s Challenges, Squarespace’s…

The AI Journal 4 个月前

Step 3: Filtering

At this stage, use business rules to remove ineligible or irrelevant items.

How X approaches this

X removes tweets that:

The user has provided negative feedback on — tweets from users they’ve muted or blocked, tweets that include words they’ve muted and tweets that users have explicitly marked as ‘Not interested in this post.’
Tweets that are restricted in a specific country.
That have been recently deleted or edited.

Step 4: Scoring or Ranking

Okay — we are almost there.?

The simplest comparison that I can think of is that all the steps so far were like your city’s metro network. It gets you from place A to place B (Shivaji Park in the map below). But you still need last-mile connectivity to reach home (big pink line). This step gets you there. In this step, you bring in each user’s preference.

The goal of this step is to truly 1:1 personalize — rank the retrieved content by the likelihood of achieving the target outcome. To generate a higher precision result, this stage would:

Take in richer feature sets (user context or item-specific details)
Use more sophisticated models, typically deep learning.

Also, this cannot be precomputed and hence this is typically done real-time just before serving content to the frontend.

What does likely to consume mean

Different platforms use different metrics:

First-order metrics: Primary actions like clickthrough rate (CTR) for long-form platforms like YouTube or session time for short-form platforms like X.
Second/third-order metrics: Mature models account for post-CTR actions (e.g., read time, repeat sessions, conversion). Platforms like Uber Eats optimize for conversion (CVR) — eaters ordering from a particular store after it is shown to them on the home feed.

In the below tweet, Elon acknowledges that the recommender algorithm is optimized for engagement time on app and thus it automatically promotes videos over text-only tweets.

Does the algorithm downrank tweets with links because it knows it has to optimize for user-seconds on platform?

How X approaches this

X computes the probability of different engagement types (Like, Retweet, Reply, Share, Bookmark, etc.), assigns weights to each type, and calculates? a weighted average to sort the tweets.

Step 5: Re-Ranking or Mixing

Yes, technically, the previous stage has already sorted the results. However, this stage is necessary to incorporate last-minute adjustments specific to your application. For example:

To avoid echo chambers or loss of interest, platforms might choose to serve a balanced mix of content, including relevance, similar-but-novel, novelty, and diversity.
Since the feed is unified, product interventions like prompts and onboarding can be seamlessly integrated.
Finally, ads may also be included based on your revenue model.

Conclusion

This layered approach allows X to curate content efficiently while balancing personalization, diversity, and user engagement.

Curious how I’m managing to write? I created a CustomGPT for myself, which serves as my go-to editor and audits my first draft. Here’s the link—give it a spin! It’s free to use. https://chatgpt.com/g/g-hgI62sWPm-mediaflywheels-review-opinion-pieces

Want to republish it? This post was released under CC BY-ND — you can republish it as is with the following credit and backlinks: ‘Originally published by Ritvvij Parrikh on The Times of India. The author retains the copyright and any other ancillary rights to the post.

Media Flywheels

530 位关注者

Vinay Sarawagi

SVP, Times Group l Media Operator | AI Trust and Ethics Advocate

2 个月

Very informative

1 次回应

要查看或添加评论，请登录

Ritvvij Parrikh的更多文章

#30 - The Agentic CMS

2025年2月24日

#30 - The Agentic CMS

With the rise of Generative AI, WordPress—and content management systems (CMSes) in general—are poised for a…
#29 - Thinkin: ‘Local Community Media’ as a ‘Trusted Club’

2025年1月18日

#29 - Thinkin: ‘Local Community Media’ as a ‘Trusted Club’

Originally published in The Times of India. In this ThinkIn, I spent a couple of hours with old-school traders.
#28 - Meditations: Turning Longform into Thought-Provoking Audio Shorts

2025年1月12日

#28 - Meditations: Turning Longform into Thought-Provoking Audio Shorts

Originally published in The Times of India. There’s something fascinating about the way deep thinkers—beat reporters…

1 条评论
#27 - Corporate Strategy to Incentivize Collaboration Across Business Units

2024年12月29日

#27 - Corporate Strategy to Incentivize Collaboration Across Business Units

Originally published in The Times of India. In the previous issue, we discussed how AI-driven media companies can…
#26 - Corporate Strategy to Incentivize Collaboration Across Functions

2024年12月28日

#26 - Corporate Strategy to Incentivize Collaboration Across Functions

Originally published in The Times of India. In earlier Media Flywheels issues, I discussed critical organizational…

3 条评论
#25 - How does Reddit’s Personalization Model work

2024年12月27日

#25 - How does Reddit’s Personalization Model work

Originally published in The Times of India. In the previous post, we discussed how X or Twitter’s Personalization Model…
#24 - How Bias in Data can Derail Self-Learning AI

2024年12月26日

#24 - How Bias in Data can Derail Self-Learning AI

Originally published in The Times of India. All well-built AI is self-learning in nature, i.

1 条评论
#22 - Media’s Wicked Problem

2024年12月22日

#22 - Media’s Wicked Problem

Originally published in The Times of India. This post was originally a talk that I had given as part of WAN IFRA’s “AI…
#21 - Media Was Forced to Diversify Revenue Prematurely

2024年12月21日

#21 - Media Was Forced to Diversify Revenue Prematurely

Originally published in The Times of India. This article is part 7 of a series called ‘Reality Check on Media Strategy’.

5 条评论
#20 - Strategic Control Compromised

2024年12月20日

#20 - Strategic Control Compromised

Originally published in The Times of India. Walmart or DMart operate on razor-thin margins, yet they thrive because…

1 条评论

See all articles

#23 - How does X or Twitter's Personalization Model work

Ritvvij Parrikh

Building Stuff

X is a News Product

Step 1: Selecting from the Corpus

Step 2: Candidate Generation

Part 1 - Here’s how X finds 750 tweets from within a user’s network:

Part 2 - Here’s how X finds 750 tweets from outside a user’s network:

领英推荐

Step 3: Filtering

Step 4: Scoring or Ranking

Step 5: Re-Ranking or Mixing

Conclusion

Media Flywheels

530 位关注者

Ritvvij Parrikh的更多文章

社区洞察

其他会员也浏览了

OpenAI and Microsoft: Symbiotic or future frenemies?

Weekly dose of UI/UX Industry News / 64th Edition

Milvus 2.4 is here, Latest RAG articles, Zilliz Cloud on Azure Marketplace, and SO many March and April virtual and in-person events!

?? Ride the Databricks Wave - Community Launch & Major Wins!

Inside the OpenAI Drama

AI <Connect> Newsletter | Edition #8

Run Scrapy on Apify

Microsoft Reaches All-Time High Amid Growing OpenAI-Related Optimism

Elon Musk Wants to Acquire OpenAI for $97.4 Billion: What Does This Mean for the AI Industry?

Issue #271 - The ML Engineer ??

X is a News Product

Step 1: Selecting from the Corpus

Step 2: Candidate Generation

Part 1 - Here’s how X finds 750 tweets from within a user’s network:

Part 2 - Here’s how X finds 750 tweets from outside a user’s network:

领英推荐

Step 3: Filtering

Step 4: Scoring or Ranking

Step 5: Re-Ranking or Mixing

Conclusion

Media Flywheels

530 位关注者

Ritvvij Parrikh的更多文章

#30 - The Agentic CMS

#29 - Thinkin: ‘Local Community Media’ as a ‘Trusted Club’

#28 - Meditations: Turning Longform into Thought-Provoking Audio Shorts

#27 - Corporate Strategy to Incentivize Collaboration Across Business Units

#26 - Corporate Strategy to Incentivize Collaboration Across Functions

#25 - How does Reddit’s Personalization Model work

#24 - How Bias in Data can Derail Self-Learning AI

#22 - Media’s Wicked Problem

#21 - Media Was Forced to Diversify Revenue Prematurely

#20 - Strategic Control Compromised

社区洞察

其他会员也浏览了

OpenAI and Microsoft: Symbiotic or future frenemies?

Weekly dose of UI/UX Industry News / 64th Edition

Milvus 2.4 is here, Latest RAG articles, Zilliz Cloud on Azure Marketplace, and SO many March and April virtual and in-person events!

?? Ride the Databricks Wave - Community Launch & Major Wins!

Inside the OpenAI Drama

AI <Connect> Newsletter | Edition #8

Run Scrapy on Apify

Microsoft Reaches All-Time High Amid Growing OpenAI-Related Optimism

Elon Musk Wants to Acquire OpenAI for $97.4 Billion: What Does This Mean for the AI Industry?

Issue #271 - The ML Engineer ??