Graph Customer Similarity

Graph Customer Similarity


… or how we devised a new way to calculate similarity across networks of shops…

This project was more of a piece of research rather than a product but it showed real potential to generate significant revenue for the Bank I worked at. Because the work is novel and in-depth, this short case study will not give all the details of implementation. However the project has been presented extensively at the time and if you are interested you can find write-ups and presentations online here https://www.slideshare.net/HarryPowell4/graph-recommendations


[BTW, I am tempted to revisit the work in the light of watching Jure Lescovec’s fascinating Stanford University course on Machine Learning with Graphs https://www.youtube.com/playlist?list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn ]


At the core of machine learning techniques is a distance metric telling you how similar one datapoint is to another. We became interested in how similar two businesses (or two customers) were to each other. This could be used in a recommendation engine, for example. We thought about our transactional data, which is normally represented as a sequence of transactions. However it can also be reimagined as a graph or network, much like the better known social networks in Facebook or Twitter. Mathematically any set of pairs (in this case transactions between customers and retailers) can be thought of as a graph, in this case bipartite with each retailer connected together through a set of customers. This offers the opportunity of measuring the distance between businesses (or alternatively between customers) in preference space, with a business being more similar the fewer steps it takes to get to it through the graph.


A lot of academic work has been done on graph theory and there are many metrics and concepts for measuring distance, but none of the concepts that are conventionally used quite worked in this case. The key metric in graph theory is minimum distance, in other words the length of the shortest path which connects two notes. This does not work in our kind of network because, for example, virtually everyone in the south west of England has been to the Ikea at Cribbs Causeway in Bristol; There are hubs in the network. This means that the minimum distance between any two shops is always either one or two steps which is not very useful.


Alternatively one could think about the distance between two businesses using cosine similarity, a conventional metric in recommender systems, but this is again flawed in any network which is sparse. Cosine similarity measures the overlap between the customers of two shops, but given that people tend to shop locally this may be virtually nonexistent between shops in different localities. However there may be considerable overlap in the second or third degree of separation, which is not measured by conventional metrics.


We wanted to come up with a metric that could measure the distance between two nodes using the information of the entire graph. We imagined standing in a shop (the source node) with a number of red tokens, giving one to each customer who entered the shop, and instructing them to mark that token and then hand it over to a random customer at the next shop they visited, instructing that customer to do likewise. We then imagined standing at another shop (the destination node) waiting for any customer with a token, which we would collect. The average number of marks on the collected tokens would be the expected degrees of separation between the source and destination nodes. This is a similarity metric which met our requirements.


So far so good. But the real challenge was how to compute this in a network of say 1 million businesses. The model can be represented as a Markov transition matrix with absorbing states which has a closed form solution. Unfortunately this involves inverting a 1 million sided square matrix 1 million times to calculate the EDS between each node. Given that matrix inversion is an O(n^3) process that's going to take a long time.


We thought quite a lot about how to diagonalize the transition matrix in order to ensure an efficient partitioning given that it should be quite unlikely that a customer of a supermarket in Bristol was also a customer of a supermarket in Newcastle, and we came up with a number of promising strategies. An alternative is to use a sampling methodology to estimate EDS, or simply to multiply the matrix a limited number of times in the hope that EDS will have converged sufficiently by then. Both of these methods are pretty quick in comparison to the exact solution, particularly if you can compress your transition matrix into a small enough size to fit into the memory of a GPU card.


The results were interesting. EDS resulted in a much more intuitive distance than the conventional metrics, but the estimate of EDS with limited path length seems to be even better than the exact calculation using the closed form solution. This is unusual. Normally you would expect the exact solution to be the best. However it turns out that only the shorter parts between source and destination nodes really carry much informational value. Once the path departs greatly from a relatively direct route it's arrival at the destination is as much coincidence as anything else. By limiting the parts with an arbitrary cut off distance we eliminated that noise and ended up with much better results. This was a surprise to us and a big learning. Occasionally estimation can improve the informational quality of a result.


The other big learning was that computation is often the real problem in data science. This is often overlooked and people assume that you can just buy enough processing power to solve any problem. But a lot of machine learning methodologies have non-linear complexity yet our ability to add processors is essentially linear. We needed to use all our understanding of what was going on “under the bonnet” of the computer in order to make our calculation tractable. We needed to take computational considerations into account virtually from the start of the project, even when we were thinking about the analytical challenge. One of the reasons we were able to do this is because the team at the Bank was structured to integrate the mathematicians and the computer scientists very closely. We never build small data prototypes and then threw them over the wall to some poor data engineer to scale. I don't believe that often results in a satisfactory product.


A final interesting thing about this project was that while we built it as a research project with a particular use case in mind (recommendation), in fact it will probably be used more for another use case, partitioning data for competitor cohort analysis. Alternatively, the vector of distances between businesses or customers is effectively a complete description of a customers latent preferences or a businesses values. This is like a strand of DNA which can describe a subject with much more detail than any set of features drawn from a cross-sectional data base, and it has more credibility that clickstream behaviour data because it is derived from actual purchases. While it is important that data scientists focus on problems that the business needs to solve, it is also essential that your data scientists are given some freedom to develop new ideas and new technologies which have the potential to transform your business. This project was one of those. 

Rotimi Ogunbiyi

IT Specialist || Technical Support || Cybersecurity || Computer Maintenance & Repair || Computer Networking Technician

7 个月

Does anyone has idea or has worked on, "similarity measures of social media database". Particularly getting data to work on,

回复
Sriram S

AI Strategy | Advisory | Google PMLE | Star Performer | Learning Catalyst | Data Science Mentor

3 年

Can we use graph colouring to colour similar customers .. vertex colouring I mean . Just a thought!

回复
Mishal Patel

Leading AI-driven drug discovery as Corporate Vice President, AI & Digital Research at Novo Nordisk

3 年

Nice article Harry.

回复
Peter Curtis

Artistic engineer developing solutions for climate resilience.

3 年
回复
Peter Curtis

Artistic engineer developing solutions for climate resilience.

3 年
回复

要查看或添加评论,请登录

Harry Powell的更多文章

社区洞察

其他会员也浏览了