Signals: Sustained CTR Growth 18 Months On
Authored By: Ritvvij Parrikh
Co-authored By: Ashish Jaiswal and Manish Mishra
Quick Recap: In February 2022, we began work on Signals, a publication-agnostic personalization platform for news. Within six months, we had built a simple ranking model for TOI+, our subscription product, which resulted in a minimum 100% boost in CTR. Encouraged by this success, we formally launched work on a large-scale collaborative filtering model for the larger business. By March 2023, we had implemented it on our website in our recirculation widgets, achieving an 85% boost in CTR.
A lot has happened since then.
The engineering teams put in a tremendous amount of work to improve the reliability of our infrastructure: data ingestion, low-latency data pipelines, regular data audits, scaling of servers, and tweaking caching to serve millions of audiences. A deep dive into these topics will be covered in a separate blog post.
In this post, we’ll focus primarily on the evolution of our product and machine learning functions.
Outcomes
Compared to editorial distribution, Signals continues to sustain its CTR growth even after 1.5 years:
The last point is critical because it demonstrates that tapping into evergreen content is a ‘distribution problem.’
Improvements to the Model
Self-Learning (and Data Drift):
One of the major reasons the CTR increase percentage has sustained (and grown in many situations) is because we’ve resisted the urge to add any business rules to the model. It is completely self-learning. Whenever we discover a new negative phenomenon, instead of adding business rules as a band-aid solution, our ML team tries to discover causality (why it is happening) in the data and solve it mathematically.
Here’s an example:
Phenomenon: Until just a couple of months ago, most of our analysis was at the day level. When we explored the data at the hour level, we realized that the model’s performance was extremely high during the morning but progressively worsened from 1 PM until 4 AM the next morning.
Caused by Data Drift: After weeks of exploratory data analysis and tests, the team established a direct causal relationship between the quality of personalization and the quantity and diversity of fresh content supply available. While our digital editorial team publishes a steady quantity of stories across multiple shifts, at 4 AM, our digital CMS (Denmark) pulls all stories from yesterday’s newspaper (from across all editions) and auto-publishes them.
Solution: The team started dynamically changing hyperparameters of the model to account for this phenomenon, which stopped the degradation of CTR in the evening.
Progressively Personalize:
All good search systems start wide and progressively narrow down to what the user sees. At each phase of narrowing, the number of data points considered increases.
At the training layer, conceptually it works like this:
At the model serving API layer, it further optimizes while serving the API output:
Remove Stale Stories:
Personalization teams in news companies have a major advantage over those at social media firms.?
However, there’s one big challenge: Audiences do not expect to see stale content on their feed. Identifying and removing such stories is an extremely hard problem to solve.
领英推荐
Our first attempt at solving this problem was to train a binary classifier. However, the model occasionally produced false positives, leading to user complaints about stale stories.?
Next, we then started forecasting CTR from a story based on its prior day’s performance, which greatly reduced the number of stale stories being recommended.
Forgetting:
In civil law, there is a concept of the statute of limitations, a period within which legal proceedings may be initiated. Similarly, in personalization, it is important for the model to forget user preferences after a certain period. This keeps recommendations adaptive to recent trends and interests.
Change in Perspective
A year and a half ago, we thought of personalization as a layer sitting on top of everything else. Now we increasingly see personalization impacting every aspect of the business. Here are three examples:
Mitigate ‘Concept Drift’ with the Product Team:
Concept Drift occurs when the relationship between input and output starts changing. This can happen if the input data collected isn’t representative of reality (i.e., what users are experiencing) or if its definition has changed. Such drifts would typically bypass most data audit systems because the data’s valid values remain the same. Eventually, this makes that particular data point no longer reliable or fully accurate.
For example, let’s say there is a data point that is true if a user sees a paywall on a story. If, after a feature release, users still see the paywall but this data point is now false, it results in silent concept drift.
So far, there doesn’t seem to be an automated way to catch this. Hence, just as teams take SEO and performance approval before launching a feature on the browser, we will need to introduce a review process for data concept checks.
Mitigate ‘Bias’ with the Design Team:
There’s a reason why most algorithmically distributed internet products serve content in simple, straightforward feeds. This is true for Google, LinkedIn, X, Facebook, Instagram, Google News, News Inshorts, etc. This is because:
The user experience adds bias to clickstream data, which can negatively impact a model's performance. Hence, it is critical for models to normalize and de-bias the input data. However, this normalization step requires input data, which is only possible to collect reliably if the UX is a simple feed.
Additionally, all personalization is essentially a search and sort problem. Therefore, you won’t find ‘widgets’ or ‘sections’ in the middle of the feed, as widgets distort the sort order of the algorithm.
To ensure this, we need to work closely with the platform design teams to drive decisions.
Factor in ‘Cost of Revenue’ in Distribution With The Business Team:
The Times of India digital operates multiple revenue models — direct ads, indirect ads, sponsored content, subscription, affiliate, microtransactions, etc. — on the same real estate (website or app) targeting the same users. Previously, what was shown to each user was governed by business rules. For example, story numbers 6 and 10 in the feed would be affiliate stories. However, data shows that such rules drastically reduce engagement levels.
We’ve come to realize that our ‘recommender systems’ need not just to personalize but also to ‘maximize revenue’ while balancing the trade-offs of these competing revenue models.
The closest analogy we found is from the Mutual Fund industry. Depending on each user’s risk appetite, a mutual fund builds a portfolio of investments across various asset classes — gold, real estate, equity, etc. Similarly, our recommender models need to compute the risk/reward from each revenue model for a particular user to maximize revenue while maintaining engagement.
However, this again requires working closely with business teams to ensure their targets are met, regardless of how exposure is managed by Signals.
And before we sign off, the customary hiring plug.
There is a lot of hype about AI in the market. However, there are few jobs in the market where companies are investing in building in-house models that scale cost-effectively and handle subtle nuances. More importantly, you’ll get to work under the tutelage of an extremely senior data science and data pipeline team that has decades of putting such products into production.
We are looking to hire Java and Android developers.?
Additionally, we are considering hiring an AI Training Editor. Personalization is inherently an editorial product. It involves editorial judgment regarding what should and shouldn’t be shown to users. In this regard, there is much to learn from editorial leads about how they make editorial judgments. Deep conversations with them can lead to better understanding and formulation of nuances. Additionally, manually labeled datasets can help with rolling out algorithms. Yes, this is tedious work, but you’ll get to observe how AI works firsthand.
Join Us and #TakeUsToTheNextLevel: https://timesinternet.in/careers
Credits:
Business Development | Business Leads | Market Research | Market Intelligence Forecarsting | Bidding Coordination | Quality Assurance and Management
5 个月Thanks for including me
I wanted to bring to your notice that each time I try and open the news link from Times of India via Google News or search, I get a pop-up asking me whether I want to open the news link on the Toi app or browser to avoid this irritant I downloaded ToI app (iOS). Still, I continue to get the pop-up. I would request you to look into it. I don’t have any issue opening links to other publications via Google News. Thank you. ??