登录查看更多内容

5 challenges you face when building cutting edge class recommender systems.

Krishna Yogi Kolluru

Data Science Architect | ML | GenAI | Speaker | ex-Microsoft | ex- Credit Suisse | IIT - NUS Alumni | AWS & Databricks Certified Data Engineer | T2 Skilled worker

发布日期: 2023年11月14日

Recommender systems play a pivotal role in shaping our digital experiences, guiding us through a sea of content on online platforms. From the allure of clickbait to the influence of popularity, these systems are not immune to biases impacting user recommendations.

In this blog, we’ll learn about the intricacies of five prevalent biases in recommender systems and explore recent research breakthroughs from industry giants like Google, YouTube, Netflix, Kuaishou, and more.

1 — Clickbait bias

The ubiquity of clickbait poses a significant challenge to recommender systems. If a model is trained using clicks as positives, it risks favouring clickbait content. Covington et al (2016) propose weighted logistic regression to combat this. This technique, applied to YouTube video recommendations, leverages watch time to prioritize content with higher expected watch times, ultimately pushing clickbait lower in the recommendations.

Mathematically, it can be shown that such a weighted logistic regression model learns odds that are approximately the expected watch time for a video. At serving time, videos are ranked by their predicted odds, resulting in videos with long expected watch times to be placed high on top of the recommendations, and clickbait (with the lowest expected watch times) at the bottom.

2 — Duration bias

While weighted logistic regression addresses clickbait, it introduces duration bias. Zhan et al (2022) present a solution in quantile-based watch-time prediction. By categorizing videos and their watch times into quantiles, this approach disentangles watch time from video duration. A/B testing reveals a 0.5% improvement over weighted logistic regression, highlighting the effectiveness of mitigating duration bias.

Think about a video catalogue that contains 10-second short-form videos along with 2-hour long-form videos. A watch time of 10 seconds means something completely different in the two cases: it’s a strong positive signal in the former and a weak positive (perhaps even a negative) signal in the latter. Yet, the Covington approach would not be able to distinguish between these two cases and would bias the model in favour of long-form videos (which generate longer watch times simply because they’re longer).

A solution to duration bias, proposed by Zhan et al (2022) from KuaiShou, is quantile-based watch-time prediction.

The key idea is to bucket all videos into duration quantiles, and then bucket all watch times within a duration bucket into quantiles as well. For example, with 10 quantiles, such an assignment could look like this:

(training example 1)
video duration = 120min --> video quantile 10
watch duration = 10s    --> watch quantile 1

(training example 2)
video duration = 10s --> video quantile 1
watch duration = 10s --> watch quantile 10
...

By translating all time intervals into quantiles, the model understands that 10s is “high” in the latter example, but “low” in the former, so the author’s hypothesis. At training time, we’re providing the model with the video quantile, and task it with predicting the watch quantile. At inference time, we’re simply ranking all videos by their predicted watch time, which will now be de-confounded from the video duration itself.

And indeed, this approach appears to work. Using A/B testing, the authors report

0.5% improvements in total watch time compared to weighted logistic regression (the idea from Covington et al), and
0.75% improvements in total watch time compared to predicting watch time directly.

The results show that removing duration bias can be a powerful approach on platforms that serve both long-form and short-form videos. Perhaps counter-intuitively, removing bias in favour of long videos improves overall user watch times.

3 — Position bias

Position bias occurs when higher-ranked items garner more engagement solely due to their position, not content quality. Techniques like rank randomization and intervention harvesting offer remedies. Crucially, monitoring diverse metrics, including user retention, becomes essential to counter the potential degradation of model quality over time.

Particularly problematic is that position bias will always make our models look better on paper than they are. Our models may be slowly degrading in quality, but we wouldn’t know what is happening until it’s too late (and users have churned away). It is therefore important, when working with recommender systems, to monitor multiple quality metrics about the system, including metrics that quantify user retention and the diversity of recommendations.

John Savill 3 周前

System Design Study: Instagram's Explore Recommender…

Vivek Bansal 4 个月前

The Cold Start Problem in Recommender Systems:…

Iain Brown Ph.D. 8 个月前

4 — Popularity bias

Popularity bias refers to the tendency of the model to give higher rankings to items that are more popular overall (because they’ve been rated by more users), rather than being based on their actual quality or relevance for a particular user. This can lead to a distorted ranking, where less popular or niche items that could be a better fit for the user’s preferences are not given adequate consideration.

logit(u,v) <-- logit(u,v) - log(P(v))

where

logit(u,v) is the logit function (i.e., the log-odds) for user u engaging with video v, and
log(P(v)) is the log-frequency of video v.

Of course, the right-hand side is equivalent to:

log[ odds(u,v)/P(v) ]

In other words, they simply normalize the predicted odds for a user/video pair by the video probability. Extremely high odds from popular videos count as much as moderately high odds from not-so-popular videos. And that’s the entire magic.

Indeed, the magic appears to work: in online A/B tests, the authors find a 0.37% improvement in overall user engagement with the de-biased ranking model.

5 — Single-interest bias

Suppose you watch mostly drama movies, but sometimes you like to watch a comedy, and from time to time a documentary. You have multiple interests, yet a ranking model trained to maximize your watch time may over-emphasize drama movies because that’s what you’re most likely to engage with. This is single-interest bias, the failure of a model to understand that users inherently have multiple interests and preferences.

To remove single-interest bias, a ranking model needs to be calibrated. Calibration simply means that, if you watch drama movies 80% of the time, then the model’s top 100 recommendations should include around 80 drama movies (and not 100).

Netflix’s Harald Steck (2018) demonstrates the benefits of model calibration with a simple post-processing technique called Platt scaling. He presents experimental results that demonstrate the effectiveness of the method in improving the calibration of Netflix recommendations, which he quantifies with KL divergence scores. The resulting movie recommendations are more diverse — in fact, as diverse as the actual user preferences — and result in improved overall watch times.

Takeaways

1. Clickbait Challenge: Addressed by weighted logistic regression, prioritizing content with higher expected watch times over sensational clickbait.

2. Duration Dilemma: Quantile-based watch-time prediction mitigates bias toward longer videos, showing a 0.5% improvement over previous methods.

3. Position Pitfall: Techniques like rank randomization counter position bias, ensuring recommendations reflect user preferences, not just rank.

4. Popularity Predicament: Normalizing odds based on video frequency combats popularity bias, leading to a 0.37% improvement in overall user engagements.

5. Diverse User Preferences: Platt scaling calibrates recommendations, acknowledging users’ multiple interests, resulting in more diverse and satisfying content suggestions.

要查看或添加评论，请登录

Krishna Yogi Kolluru的更多文章

Mastering Spark SQL Functions: A Comprehensive Guide

2024年9月2日

Mastering Spark SQL Functions: A Comprehensive Guide

Apache Spark SQL provides a rich set of functions to handle various data operations. This guide covers essential Spark…
100 Data Engineering Jargon That You Must Know

2024年8月27日

100 Data Engineering Jargon That You Must Know

Data engineering is at the heart of how businesses collect, process, and use data to make informed decisions. As the…

3 条评论
Slowly Changing Dimensions in Data Warehouses

2024年8月17日

Slowly Changing Dimensions in Data Warehouses

What is a Data Warehouse? A data warehouse is a centralized repository where data from different sources is stored. It…
VectorDB Tutorial — A Beginner’s Guide

2024年7月27日

VectorDB Tutorial — A Beginner’s Guide

A Vector Database (VectorDB) is designed to store and manage vector data, often used in machine learning and AI…
Databricks SQL Series — Part 5 — Managing and Securing Your Data

2024年7月26日

Databricks SQL Series — Part 5 — Managing and Securing Your Data

Synopsis Introduction to Data Management in Databricks Introduction to Data Management in Databricks Data management…
Databricks SQL Series: Integrating Databricks SQL with Visualization Tools — Part 4

2024年7月26日

Databricks SQL Series: Integrating Databricks SQL with Visualization Tools — Part 4

A Detailed Guide on working with visualization tools Synopsis Introduction In part 3, we saw about using Windows…
Databricks SQL Series: Advanced Analytics in Databricks SQL — Using Window Functions — Part 3

2024年7月25日

Databricks SQL Series: Advanced Analytics in Databricks SQL — Using Window Functions — Part 3

A Detailed Guide on Window Functions Synopsis Introduction Window functions in Databricks SQL are used for performing…
Databricks SQL Series — Optimizing Data Queries with Databricks SQL — Part 2

2024年7月25日

Databricks SQL Series — Optimizing Data Queries with Databricks SQL — Part 2

Synopsis Understanding the Basics of Query Optimization Welcome to the second part of our Databricks SQL Series, where…
Databricks SQL Series — Introduction to Databricks SQL — Part 1

2024年7月24日

Databricks SQL Series — Introduction to Databricks SQL — Part 1

Synopsis What is Databricks SQL? Are you a professional looking to master Databricks SQL practically? Look no further!…

1 条评论
Delta Live Tables — Part 5— Exploring Advanced Features and Optimization Techniques in Delta Live Tables

2024年7月22日

Delta Live Tables — Part 5— Exploring Advanced Features and Optimization Techniques in Delta Live Tables

As we learnt about the architecture, step-by-step process and data process management in the previous blogs of the…

See all articles

5 challenges you face when building cutting edge class recommender systems.

Krishna Yogi Kolluru

Data Science Architect | ML | GenAI | Speaker | ex-Microsoft | ex- Credit Suisse | IIT - NUS Alumni | AWS & Databricks Certified Data Engineer | T2 Skilled worker

1 — Clickbait bias

2 — Duration bias

3 — Position bias

领英推荐

4 — Popularity bias

5 — Single-interest bias

Takeaways

Krishna Yogi Kolluru的更多文章

社区洞察

其他会员也浏览了

Challenges Faced while Building Personalisation Engines

Revolutionizing Recommender Systems: Beyond the Myopic View (InDepth Review)

How to build a Multi-Stage Recommender System

MLOps for recommenders - Deploying Recommender System in Production