登录查看更多内容

Modern Visual RecSys: How does a recommender work?

Kai Xin Thia

Head of AI & Analytics, Group Tech Office, ST Engineering

发布日期: 2020年3月15日

I have worked in the data industry for over seven years and had the privilege of designing, building, and deploying two recommender systems (RecSys) that went on to serve millions of customers. In this series of articles, I will introduce modern approaches to visual recommender by walking through case studies with code and sharing some of my experience designing RecSys.

This is part of my Modern Visual RecSys series; feel free to check out the rest of the series at the end of the article.

RecSys Basics — Spotify Case Study

We begin with a case study of Spotify to understand how RecSys works and introduce several key concepts, including a modern approach called convolutional neural networks (CNN), applied to music.

My “discover weekly” recommendations from Spotify

Let us take a look at my personalized music recommendations from Spotify. It contains a mix of Chinese/Japanese/ English pop music with new music and old tunes going back some 20 years. A few observations as I scroll through the recommendations:

None of the music is from artists that I “save to liked”.
The genre is similar to what I usually listen to.
The tunes are similar to what I usually listen to.
There is a mix of new and old songs.

It seems that this recommendation product is trying to help me discover new music that is familiar yet different from my usual listening habits. But how is this achieved? Chris Jonson from Spotify has the following slide on the architecture of Discover Weekly:

Source: slide from presentation: From Idea to Execution: Spotify’s Discover Weekly by Chris Johnson

We see that there are three main methods employed (the red box):

Collaborative filtering (CF) using user behavior (play logs) and music content (track metadata).
Natural Language Processing (NLP) and text mining/scraping of news/blogs/text and music content (track metadata).
Audio models that analyze the raw audio data.

The result is a “Spotify blob” for each user, with an ever-shifting musical preference based on user interactions and an expanding library of music from Spotify. Visually, the goal of Discover Weekly is to find these white contour lines that cut across the user’s musical preferences.

Source: The magic that makes Spotify’s Discover Weekly playlists so damn good by Quartz & Spotify

Let us dive deeper into each of these methods.

Collaborative filtering (CF)

Source: The magic that makes Spotify’s Discover Weekly playlists so damn good by Quartz & Spotify

CF is the classic method employed across different RecSys. It simply takes user interactions (your clicks, saves, likes, purchases, etc.) and matches them with other users in the system with similar tastes (in music, films, fashion, etc.).

CF assumes that users with similar tastes will appreciate content from others within the same community.

There are drawbacks such as:

Echo chambers (Facebook showing you left/right-wing posts over and over again based on your reading behavior)
Safe but boring recommendations (recommending another Artist A song when you know that I am a big fan of Artist A)
Cold start problem where CF cannot match new items/users due to a lack of data — CF will always need to be paired with a backup plan in deployment (top/most popular products for example)

There are various implementations of CF. For example, Spark’s alternating least squares (ALS), FastAI’s collab, Surprise (for explicit/user rating data). Check out the further readings section for more tutorials.

Natural Language Processing (NLP)

One way to handle the cold start problem, especially for new releases, is to scrape the internet news/blogs and fill in metadata information about the song (artist, title, mood {happy, love…}, genre {pop, Korean}, etc.) with web scrapers like Beautiful Soup, Scrapy, etc.

Source: slide from presentation: From Idea to Execution: Spotify’s Discover Weekly by Chris Johnson

With the scrapped textual data, together with details from the playlist, it is possible to associate keywords with individual artists/playlists.

Modern approaches make use of word embeddings to construct sentence/document vectors; mathematical representations that allow for comparison across the vector space.

Common techniques are word2vec, doc2vec, and Latent Dirichlet Allocation (LDA). Vectorization is key to the content-based recommender my team built at Tech in Asia.

Source: Introducing Tech in Asia’s unique content recommender by By Will Ho & Joshua Lim

Audio Models

Sander Dieleman (Research Scientist at DeepMind) once interned at Spotify and wrote a great article on Recommending music on Spotify with deep learning. He used a technique called convolutional neural networks (CNN) that we will cover in the later chapters. Intuitively, our goal is for each filter (shown as columns in the image below) picks up a distinct musical feature.

Visualization of the filters learned in the first convolutional layer. The time axis is horizontal, and the frequency axis is vertical. Source: Recommending music on Spotify slides by Sander Dieleman

If we zoom in to take a look at the specific filters, we can pick up trends as noted by Sander:

Closeup of filters 14, 242, 250 and 253. Source: Recommending music on Spotify slides by Sander Dieleman

"Note that the time axis is horizontal, the frequency axis is vertical (Frequency increases from top to bottom). Negative values are red, positive values are blue, and white is zero".
"Filter 14 seems to pick up vibrato singing. [Notice the recurring blue shades for column 14 across different frequencies]"
"Filter 242 picks up some kind of ringing ambience. [Notice the blue stripe +red base]"
"Filter 250 picks up vocal thirds, i.e., multiple singers singing the same thing, but the notes are a major third (4 semitones) apart. [Notice the neat recurring alternation between red and blue rows]"
"Filter 253 picks up various types of bass drum sounds. [Notice that most of the music exists within a small range of frequencies at the top]".

These musical patterns act as a musical signature, allowing Spotify to mix and match songs of similar signatures to generate playlists that sounds familiar but with degrees of controlled novelty for the user.

The data scientists can always dial up or dial down the novelty mix to the musical signature based on user response to the recommendations. Such is the power of modern tools like CNN.

What have we learned

RecSys are very interesting models to explore. Even seemingly simple music playlist recommendation can involve a diverse array of models that brings together the user interactions, content, external data, and domain-specific techniques such as audio models in this Spotify case study.

In the next chapter, we will learn how to design a recommender.

Reflections

Take a look at your recommendations on Spotify (or Amazon/ Netflix/ YouTube/ any other services you used with personalization).

Are they relevant to you? What % of the recommendations are spot on? What % is terrible?
How will you improve the recommendations?
Will you put more weight on recent behaviors vs. historical?
How will you introduce new products?
What will you show new users?
How will you design a recommender that keeps up with the latest trends?

Explore the rest of Modern Visual RecSys Series

How does a Recommender Work? [Foundational][we are here]
How to Design a Recommender? [Foundational]
Intro to Visual RecSys [Core]
Convolutional Neural Networks Recommender [Pro]
COVID-19 Case Study with CNN [Pro]
Building a Personalized Real-Time Fashion Collection Recommender [Pro]
Temporal Modeling [Pro]
The Future of Visual Recommender Systems: Four Practical State-Of-The-Art Techniques [Foundational]

Series labels:

Foundational: general knowledge and theories, minimum coding experience needed.
Core: more challenging materials with code.
Pro: Difficult materials and code, with production-grade tools.

Kai Xin Thia的更多文章

Deep Dive into Robotics Learning Architectures

2025年3月19日

Deep Dive into Robotics Learning Architectures

This week, we explore the latest advances from Figure’s Helix, NVIDIA’s Isaac GR00T N1, and Google's Gemini Robotics…
The Art of Coordination: Inside the World of Multi-Robot Task Assignment and Exploration

2025年3月4日

The Art of Coordination: Inside the World of Multi-Robot Task Assignment and Exploration

This week, we explore the brave new world where robots team up to tackle high-stakes missions, from finding survivors…
Small but Mighty: SLMs are Democratising AI

2025年2月27日

Small but Mighty: SLMs are Democratising AI

This week, we explore the surge in the development of small language models (SLMs) that address the growing need for…

5 条评论
DeekSeek AI Agents for Knowledge Graph Augmentation & Query

2025年2月20日

DeekSeek AI Agents for Knowledge Graph Augmentation & Query

This week, let's explore how advancements in AI-driven knowledge management pave the way for more efficient and…
Advanced Agentic Reasoning with Structure & Optimisation

2025年2月13日

Advanced Agentic Reasoning with Structure & Optimisation

LLMs are transforming beyond simple text generation to complex problem-solving and expert-level reasoning. This shift…

1 条评论
Practical Humanoid Robots - Agile, Affordable, Teleoperated

2025年2月5日

Practical Humanoid Robots - Agile, Affordable, Teleoperated

This week, let's take a deeper look into Humanoid robotics, which is experiencing a rapid transformation, making…
DeepSeek – A Deep Dive into Efficiency and Innovation

2025年1月27日

DeepSeek – A Deep Dive into Efficiency and Innovation

This week, we will explore DeepSeek, a Chinese AI lab that has rapidly gained recognition for its innovative LLM…

14 条评论
Applied AI: LLMs for Enhanced Emergency Response

2025年1月25日

Applied AI: LLMs for Enhanced Emergency Response

This week, we explore several innovative approaches to leveraging LLMs and other AI techniques to enhance emergency…

2 条评论
Physical AI and the Convergence of Embodied & Living Intelligence

2025年1月17日

Physical AI and the Convergence of Embodied & Living Intelligence

The rapidly developing field of Artificial Intelligence is no longer confined to the digital realm of text and images…
Future of Humanoid Robotics

2025年1月9日

Future of Humanoid Robotics

The world of humanoid robotics is on the cusp of a significant leap forward, driven by the convergence of sophisticated…

1 条评论

See all articles

Modern Visual RecSys: How does a recommender work?

Kai Xin Thia

Head of AI & Analytics, Group Tech Office, ST Engineering

RecSys Basics — Spotify Case Study

Collaborative filtering (CF)

Natural Language Processing (NLP)

Audio Models

What have we learned

Reflections

Explore the rest of Modern Visual RecSys Series

Further Readings

Kai Xin Thia的更多文章

社区洞察

其他会员也浏览了

Guide to Democratizing News Curation, Subscription, and Reading

Oops I did it again.

AI-Generated music will change everything.

Gen AI Updates: Mistral Large's Arrival, Alibaba's EMO, Data Moves by OpenAI & MidJourney, Adobe's AI Music & Amazon's Privacy Measures

Newsletter #2: What would Steve Jobs do about Generative AI?

The AI Creative Explosion: From Images to Music, the Landscape is Shifting

10 lessons you’ll learn from sitting in our Ethical AI workshop

AI DeepSongs Review - Latest 2024 AI Video Song Software

The Changing Landscape of AI - Why Data Licensing Is the New Frontier

Exploring the Legal Implications of Generative AI: Is it Fair Use?

RecSys Basics — Spotify Case Study

Collaborative filtering (CF)

Natural Language Processing (NLP)

Audio Models

What have we learned

Reflections

Explore the rest of Modern Visual RecSys Series

Further Readings

Kai Xin Thia的更多文章

Deep Dive into Robotics Learning Architectures

The Art of Coordination: Inside the World of Multi-Robot Task Assignment and Exploration

Small but Mighty: SLMs are Democratising AI

DeekSeek AI Agents for Knowledge Graph Augmentation & Query

Advanced Agentic Reasoning with Structure & Optimisation

Practical Humanoid Robots - Agile, Affordable, Teleoperated

DeepSeek – A Deep Dive into Efficiency and Innovation

Applied AI: LLMs for Enhanced Emergency Response

Physical AI and the Convergence of Embodied & Living Intelligence

Future of Humanoid Robotics

社区洞察

其他会员也浏览了

Guide to Democratizing News Curation, Subscription, and Reading

Oops I did it again.

AI-Generated music will change everything.

Gen AI Updates: Mistral Large's Arrival, Alibaba's EMO, Data Moves by OpenAI & MidJourney, Adobe's AI Music & Amazon's Privacy Measures

Newsletter #2: What would Steve Jobs do about Generative AI?

The AI Creative Explosion: From Images to Music, the Landscape is Shifting

10 lessons you’ll learn from sitting in our Ethical AI workshop

AI DeepSongs Review - Latest 2024 AI Video Song Software

The Changing Landscape of AI - Why Data Licensing Is the New Frontier

Exploring the Legal Implications of Generative AI: Is it Fair Use?