Building a “Next Best” Recommender System from Scratch is a Good Option

Building a “Next Best” Recommender System from Scratch is a Good Option

Recommender systems are a core use case for artificial intelligence in Marketing. In fact, the global market for AI-based recommendation engines is expected to grow from $800 million in 2017 to $4.4 billion by 2022. That’s a compound annual growth rate of 40.7%. Understandably, there’s a lot of interest (and also hype) about this topic, driven in part by a “battle of Titans” for share of this space – companies like Amazon, Google, HPE, IBM, Intel, Microsoft, Oracle and Salesforce, to name a few.

Stepping back from the hype, there really are good reasons to build product or service recommender systems, but there are also plenty of real-world problems and trade-offs to consider in the design phase. Customization is appropriate for many situations. In this post, I’ll share two examples of how AI-based recommendation methods sometimes need to be adapted to fit the reality of a particular marketing problem.

But First, the Basics

Recommendation systems use artificial intelligence to infer the most relevant product or service to offer to each customer or prospect, thereby increasing relevance of communications, and improving engagement, consideration, conversion, retention and customer lifetime value. Given that these are the outcomes of better recommendations, it’s easy to see why recommendation systems themselves would be of interest to marketers. Good recommender systems can deliver 5% lift in visit or purchase frequency vs. a control.

Good recommender systems can deliver 5% lift in visit or purchase frequency vs. a control.

The most common machine-learning applications for “next best” recommender systems fall under a category called collaborative filtering, which is actually a family of methods, that include two primary branches: Item-based Collaborative Filtering (IBCF) looks at related products or services that are frequently purchased or viewed together. An example of this approach is “More like this” on Pinterest.

Building on this, market-basket analysis often looks at three key metrics: Support, Confidence and Lift. Support describes how often an item occurs in a dataset. Confidence describes how frequently Y appears in transactions that contain X. Lift measures the chances of buying Y, given that X was purchased (compared to the chances of buying Y in general).

Typical uses of IBCF include

  • Better online recommendations
  •  Product bundling for promotions
  • Substitute high margin / strategic brands for lower margin or less strategic brands
  • Piggyback slow-moving products onto more successful ones
  • Optimize shelf placement of products
  • Improve effectiveness of a print or online catalog
  • Assortment planning

Another common approach to this same problem is user-based collaborative filtering (UBCF), which is a method that collects and evaluates choices made by other customers, visitors or guests. Unlike IBCF, this approach looks at similarities between people. Various methods exist for calculating that similarity, including Euclidean distance and cosine similarity. Usually multiple methods are tried, and the best approach is recommended, based on your data and objectives.

Even more advanced methods are possible, including customer genome mapping, which is described in my soon-to-be-published article: “The Customer Genome - An Empirical Approach to Segment-of-One Marketing,” DMA Analytics Journal, April, 2018. Genomes are a hot topic at the cutting edge of recommender systems. As an example, Pandora just announced that they will roll out “music genomes” this month as part of a more-customized recommender strategy, designed to compete with Spotify. (In reality, the approach that they characterize as genomes is actually just IBCF, based on very long strings of binary variables.)

Whether by means of genome mapping or by using more traditional methods, the output of a recommender engine is typically specified in the form of a waterfall, often consisting of a most likely, second-most likely, and third-most likely product or service to recommend, leading to automated recommendations for use by a marketing system, in a prioritized order of likely interest to the user. This work can all be done in R which, of course, is open source.

Measures of Performance

The performance of collaborative filtering and similar systems can be evaluated, using cross-validation on holdout data. For example, you would probably want to know the percentage of cases where the predicted product or service was actually selected. The measure for this is sometimes referred to as a true positive rate (TPR).

Another useful approach for measuring performance is Average Reciprocal Hit Rate (ARHR) which is a way to compare how different models perform, in terms of ranked order of their recommendations. For example, if the algorithm recommended Product 1 (P1), Product 2 (P2) and Product 3 (P3) in that order, and if the customer actually purchased P2, then the ARHR measure for this would be:

ARHR = (0*1) + (1*1/2) + (0*1/3) = 0.5

If they purchased both P1 and P2, then the ARHR measure would be:

ARHR = (1*1) + (1*1/2) + (0*1/3) = 1.5

and so forth. When averaged across a database, ARHR offers a simple way to compare performance of different algorithms vs. a baseline prediction and also vs. each other.

Per-guest distance metrics can optionally be provided as well, based on the fact that collaborative filtering is actually a kind of specialized nearest-neighbor problem. F1 scores are yet another approach that can be used to measure model performance.

OK, that’s the basics. But it’s not always so easy.

Real World Problem #1 - Financial Services

A financial services provider had previously implemented a recommender system for cross-selling banking products. The number of products that they had available to offer was fairly limited. Their goal was to improve model performance as compared to the existing approach.

The problem was that many popular AI / machine learning recommender methods work best when the number of choices is high, which was not the case here. Also, typical methods do not have a time dimension. In other words, they correctly recommend that a product is a fit for a particular customer, but not necessarily within the next 30 days, for example. Also, such systems are often designed to always recommend a product or service, even when one is not necessarily an opportunity.

Many problems. . .

There are several ways around this. One approach is to build a hierarchy of models, where one model helps determine whether we should target a customer at all, and then separate models give us the product or service to serve to those particular customers who should actually be targeted.

Another approach is a multinomial model, where a single algorithm delivers the probabilities for each product or service, including the probability of nothing being chosen.

Yet another approach is to look at product affinities and first determine which product or service category to sell, and then, within that, the specific product or service to pitch. (This works best if you’re dealing with product variants.)

All of these approaches involve AI / machine learning methods, and all of them deal with class imbalance, which is a typical problem for cross-sell models in many situations. Trying out each of these methods and delivering a final model would take about twelve weeks to do, end to end.

Real World Problem #2 - Hospitality

In the hospitality industry, a resort brand wanted to recommend the most relevant property to its guests. Many visitors return again and again to the same resort, but evidence showed that some of their guests could be described as “Samplers.” These people apparently sought to experience multiple resorts within the family of brands.

Given this insight, it would not be optimal to simply push the “primary” (i.e., most visited) resort to every single guest, even though the preponderance of visitors do indeed have a preferred resort, and should see messages about that property.

This situation highlights an important constraint of collaborative filtering methods, which is that they always recommend a new product or service (or resort, in this case). That would not be an acceptable outcome for this situation, because a previously-visited resort would actually be the most relevant property to feature for most guests.

What to do?

Two approaches can be used in this situation. One is to first segment guests based on a model, such as Loyalists vs. Variety Seekers, and then to use different IBCF models per segment.

Another approach would be to develop ‘n’ propensity models – one for each resort, and serve up recommendations based on a waterfall of propensity model scores. An advantage of this second approach is that it allows marketers to set a score threshold, below which a resort would not be recommended, because of apparently low relevance, as quantified by the model score.

Summary

From even just these two examples, it’s easy to see that effective recommender systems require thought and effort to build, with good collaboration and creative problem solving. Fortunately, the effort is worth it. Significant lift in top-line revenue can be achieved. 5% lift on purchase frequency is a big deal.

Jim Griffin is the Americas Director of Cartesian DataSciences.

Any questions or feedback? Please feel free to post here, and I’ll respond in Comments, or you can contact me at [email protected]

Suman Ipe

Principal Sales Engineer

1 年

Hi - thanks - very informative. With regard to product placement , I have heard the recommender engine can determine placement with regard to margin of the product. Could you explain how that works - in lay mans terms. Thanks again

回复
Amy Birdee

Data Scientist at Meta

3 年

Great article! I'll soon be building a recommender model and this has given me plenty of options (and potential problems!) to think about!

回复
Jay Tkachuk

Executive Vice President, Chief Digital Officer

6 年

A very good read.

要查看或添加评论,请登录

Jim Griffin的更多文章

社区洞察

其他会员也浏览了