登录查看更多内容

How we built a media recommender with ChatGPT and without training data

Yuri Borisov, PhD

Yuri Borisov, PhD | Co-Founder & CEO at NextCreature / Randowise Oü | ChatGPT can do much more with the right software around it

发布日期: 2024年8月28日

All-in-one recommenders that give good suggestions for movies, tv shows, anime series, books (fiction and nonfiction) are not that common. In fact, we couldn’t find one, so why not create it ourselves?

Spoiler

Here is how the final version looks like: https://TopN.ai

Task definition: why it is hard to build media recommender

In the video below task definition and main technical challenges are briefly presented.

Traditionally, it was impossible to create a recommender without any training data / statistics / logs available. The task becomes even more challenging and training data requirements even more severe if we want to recommend different content types: movies, tv shows, books and so on. Fortunately, there is an alternative approach that does not rely on the training data - to leverage GPT models to extract the data needed.

Solution: how the recommender was actually built

Below is the video that walks you through the entire development process. First 3 minutes are devoted to introducing the concepts and outline the solution, in the remaining 10 minutes the actual development process is shown and explained.

Here are the key points

Invent collection titles. There are ways to use ChatGPT in automatic or semi-automatic manners to generate a diverse set of collection titles relatively quickly.
Populate collections with the actual content. For each collection title, we need to generate relevant movies, tv shows, anime series, books (fiction), books (educational). This step is fully automated with ChatGPT.
Tag each collection title. We just need to ask ChatGPT to come up with 10 tags that characterise each collection. Tags are essential since they make User Experience interactive. This step is fully automated with ChatGPT.
Calculate Tags Similarity Matrix. We want users to be able to pick relevant tags quickly. For example, if the tag “alien” is selected, we might want to recommend other relevant tags to the user (say, “alien life” or “alien worlds”). It's the Similarity Matrix that enables this functionality.
Leverage proper web interface. Once we have all the data generated, we can put this data into an intuitive, interactive interface.

Analysis of the media recommender

Here is a video with our brief analysis of the recommender built.

It’s important to mention that once the recommender is built - meaning, all data generated and tag similarities calculated - no ChatGPT is needed for the recommender to work.

Our experience as well as the experience of our early users with the recommender is positive. However, not all collections recommended are perfect. In the next section we discuss limitations and future line of work.

Limitations

Below are some limitations of the current version and thoughts on improvements.

Precision

We use the “precision” term here to reflect how well recommended movies, tv shows, books match the specific collection title. For instance, if in the collection “Top 5 Alien Invasion Thrillers with Unique Tactics” we find “Forrest Gump” movie - that is a precision problem.

There are at least two sources of errors that impact precision

Sangram Maharana, PMP? 3 个月前

ChatGPT in Microsoft Bing and Azure OpenAI - Already…

Rand Morimoto 1 年前

Adding the Wolfram Plugin to ChatGPT4

Ahmed Naumaan 10 个月前

ChatGPT related errors. If the collection title is too specific, corresponding movies may not exist. Or ChatGPT may not be aware of less popular items. This situation leads to suboptimal recommendations.
API related errors. We observed cases when the recommendation itself is relevant but API outputs very different item

Recall

We use the “recall” term here to represent the variety of collections available. For example, if I want to find movies / anime / books about software engineers but no relevant collections show up - it’s a recall problem.

Here are at least two sources that impact recall (coverage) problem

Missing topics. In our case, collection titles are generated semi-automatically and it’s the responsibility of the developer to suggest general topics for the collections. For example, as a developer, I could easily miss important topics like corporate career paths or employer-employee relationships in large corporations,? just because I’m more focused on the startup world.?
Bad coverage of the specific topics. Even thought, the Ideas Explorer App (see the videos) usually does a good job at suggestions collection ideas based on the topics given, I still potentially could disregard good collection titles due to the lack of deep domain understanding

Future work

There are a huge number of cool things that we are looking forward to implementing.?

Here, we restrict our imagination a bit and provide promising directions of work specifically related to recommenders.

More recommenders

If ChatGPT is familiar enough with the specific domain, one could employ the approach shown in the videos to build a recommender for this domain. Important point here is that one will not need any training data to do it!

Here are a list of recommenders that seems valuable and unique:

Discover Chrome browser features. User picks tags like “navigation, hotkeys” and see the list of relevant “posts”.?
Command Line tricks & tips. User picks tags like “file operations” and see the list of “posts” where file operations with the command line are presented.
Memes [the hardest case]. User picks tags like “startups, founders” and collections with new funny text-based memes about founders’ life are shown.
Better usage of python. User pick tags like “lambda expressions” and collections that illustrate lambda expression usage in various contexts are shown.
Discovery of Crypto / DeFi opportunities. User picks tags like “investment, short term” and the collections are shown that suggest relevant opportunities from? Crypto / DeFi World.

If you have in mind a recommender worth building, please share the idea in the comments. We plan to implement a number of recommenders in the near future, and we would prefer to implement the recommenders people really want.

Improving media recommender

It seems that relying on Tags for navigation is one the key components in our media recommender. But, what if we remove Tags altogether - what if we want the user to pick any content he/she likes (say, the “Matrix” movie and “Breaking Bad” tv show) and the system will recommend other relevant movies, books, tv shows, anime series. In this case, we remove the concept of Tags and work with “Items to Recommend” directly. It seems that this recommender may be more engaging and fun to interact with.

The good news are:

we think that we know how do it with ChatGPT and no training data
to build Improved media recommender we can just reuse already generated collections

PS.?

We are in a relatively early stage, any thoughts, ideas, suggestions, feedback will be really helpful!?

And of course check the recommender: https://TopN.ai

How we built a media recommender with ChatGPT and without training data

Yuri Borisov, PhD

Yuri Borisov, PhD | Co-Founder & CEO at NextCreature / Randowise Oü | ChatGPT can do much more with the right software around it

Spoiler

Task definition: why it is hard to build media recommender

Solution: how the recommender was actually built

Analysis of the media recommender

Limitations

Precision

领英推荐

Recall

Future work

More recommenders

Improving media recommender

PS.?

社区洞察

其他会员也浏览了

Google is testing an AI that writes news articles for The New York Times

Meta’s Llama 3.1, OpenAI’s SearchGPT, New Compliance Tools, SAP AI Core Vulnerability, CrowdStrike Outage, and More!

Project Flux Newsletter Highlights on LinkedIn

Beyond OpenAI's GPTs: A Comprehensive Guide

Using ChatGPT to Explore Claims Databases

OpenAI's Big Update - How Can Your Company Leverage It?

Harnessing the Power of AI for Amazon Ads Analysis: A Deep Dive into ChatGPT's Code Interpreter

ChatGPT Enterprise: Advanced Data Analysis

ChatGPT your own data with Langchain and Streamlit - Part 2 now with User File Upload!

From Knowledge Graph to Data Mesh, an interview with OpenAI’s ChatGPT