How Recommendation Engines Work

?? Colin Hayhurst

CEO @mojeek | No-Tracking Search Engine

发布日期: 2019年9月25日

The internet has had a profound effect on our everyday lives, be it from shopping online, streaming music and films or consuming news reports. In all of these interactions, it is commonplace to be provided with alternative and new recommendations. For example, when you log on to Netflix, you are confronted with a list of personal movie recommendations, Spotify provides you with a weekly list of songs that you might like to listen to, the BBC News website suggests alternative news stories for you to read and if you log onto Amazon, the landing page contains a range of items you might be interested in buying.

All of these recommendations are provided by what we call recommendation engines and it is advances in machine learning over the last two decades that have made them into a powerful asset for online business.

There are two broad approaches for how a recommendation engine will work, collaborative filtering and content-based filtering.

Content-based filtering is the easiest to implement. It assumes a user will like things that are similar (measured by their features) to other items they have previously liked. For example, in Netflix, content filtering could be used to suggest movies that have the same actor or genre as films that you have already watched.

Collaborative filtering is the key behind the big advancements in recommendation engines over the last decade and one of the best illustrations of the power of “Big Data”. This approach works by assuming there are common trends and every person is made up of a combination of those trends. For example on Amazon, new parents will be needing to buy similar items to each other, whilst students will want to buy similar items such as a new laptop, notepads, etc. Businesses that have a big pool of data on users and their interactions with products, can use collaborative filtering techniques to learn what those trends are and what items are associated with those trends. They can then calculate what trends each person belongs to and recommend the items belonging to those trends.

Both recommendation engine approaches have their advantages and disadvantages. For example, the big disadvantage for collaborative filtering is that it typically requires a large amount of existing data on ratings to be able to learn from, a problem often referred to as the cold start problem. For example, if you were to start a new Netflix account, Netflix has no information on you to be able to work out what trends you belong to. Content filtering does not suffer from the cold start problem to the same extent, however, it is limited to only offering results that are similar to previous items. This limits the number of items that can be recommended compared to collaborative filtering. Many of the best sites that use recommendation engines use a hybrid model combining both types of filtering.

The priority for any basic implementation of these algorithms is to maximise predictive accuracy, i.e. recommend the best items. However, there are other necessary measures to be considered when building a recommendation engine. Diversity and Serendipity are two such measures. Diversity is a measure of how much you allow different items to be shown. This is particularly important to enable new products such as a new song on Spotify to be listened to and that does not have any ratings. Serendipity can be thought of as how novel or surprising a recommended item is, for example, suggesting a popular Hollywood movie on Netflix is not as impressive as recommending an entirely new independent film that the user might like.

Getting the balance of these measures, along with accurate ratings is non-trivial but essential.

Another important factor to consider when using recommendation engines is what type of data you are using to do the recommendation. This can often be broken down into explicit and implicit data. Explicit data is the most informative data and is typically some user interaction that is a direct indicator of how much something is liked. A good example would be rating a movie on Netflix. Implicit data is typically some user data interaction that does not directly indicate how much an item is liked, for example looking at an item on Amazon. There is a lot more implicit data available than explicit data, but it is less informative and requires more sophisticated models and/or more data to become useful.

At DataJavelin we develop recommender systems for companies using machine learning techniques. We are particularly skilled in tackling the cold start problem; developing recommender systems where there is limited initial data.

This post originally appeared on https://www.datajavelin.com/post/how-recommendation-engines-work

要查看或添加评论，请登录

?? Colin Hayhurst的更多文章

Search the Web You Want

2022年8月2日

Search the Web You Want

Each of us has a unique perspective and set of interests. Giant companies guiding what we find on the web is subtly…

3 条评论
Why we need alternative search engines

2020年7月1日

Why we need alternative search engines

As of today, I’m joining Mojeek as CEO. Mojeek is an independent search engine company and the only one independent of…

21 条评论
Small vs Big data: AI innovations for startups and SMEs

2019年11月19日

Small vs Big data: AI innovations for startups and SMEs

The press about AI is dominated by developments in reinforcement learning, image processing and natural language…
Where are the Unicorns?

2019年9月23日

Where are the Unicorns?

It's fascinating to see the nature and distribution of Unicorns across the globe. Thanks to data from CB Insights I was…

3 条评论
Machine learning, data and getting started

2019年8月29日

Machine learning, data and getting started

When businesses talk about developing AI today, they’re usually talking about building mathematical models that can be…

2 条评论
Aligning your business and AI strategy

2019年5月23日

Aligning your business and AI strategy

Artificial intelligence can have a profound and positive impact on your business, but only if there is a company…
How to be an intelligent business

2019年5月16日

How to be an intelligent business

If you are reading this, you will most likely have adopted digital, and probably the cloud, into your business. If you…

1 条评论

See all articles

?? Colin Hayhurst的更多文章

Search the Web You Want

Why we need alternative search engines

Small vs Big data: AI innovations for startups and SMEs

Where are the Unicorns?

Machine learning, data and getting started

Aligning your business and AI strategy

How to be an intelligent business

社区洞察