What I learnt doing Stanford's course on Machine Learning with Graphs

What I learnt doing Stanford's course on Machine Learning with Graphs

A quick intro to Network Science

Graph theory and network science originated as far back as the 18th century in K?nigsberg, eastern Prussia, a thriving city of merchants and ships. The city built 7 bridges across the river Pregel, like so.


The 7 bridges of river Pregel.

This particular network of land (nodes) and bridges (edges) gave birth to the popular puzzle called the problem of K?nigsberg. Can someone walk across all 7 bridges without ever crossing any one bridge twice? The famous Swiss mathematician Euler proved that there is no solution to this problem. His elegant work on this matter is considered to be the first theorem of graph theory and first proof in network science.

We have since come a very long way from there. Network science is today applied in a wide variety of sophisticated use-cases. We use graph machine learning techniques to identify fraudulent transactions, to look at protein-protein interactions and discover potential side-effects of new drugs, and to get friend recommendations in Facebook and product recommendations in Amazon.

Google's page ranking algorithm is entirely grounded on graph theory, this massive network of webpages (nodes) in the world wide web that connect to each other through hyperlinks (edges).

The intuition behind google's page rank algorithm is this - The page rank is a weighted sum of the ranks of all the web pages that link into it. Page B in this case has plenty of in-links and hence its page rank is high.

Network science is applied in biology to understand the human disease network by connecting genes whose mutations are known to cause diseases. It is applied in electricity transmission networks to understand load and failure patterns, to uncover vulnerabilities in the network.

But perhaps the quirkiest application of graph theory comes from an internal organisational structure analysis a Hungarian firm did. While the internal hierarchical structure is often used to pass down comms, it is often true that informal networks play a huge impact in employee perception. An accurate graph of such informal networks can be quite useful to identify silos between functions, and equally help identify powerful nodes. This Hungarian firm realised that the perception the frontline staff had absolutely nothing to do with the intentions of the senior management. So they conducted a study asking the employees one simple question - Who do you look up to for advice on work related matters when you need it?

In this graph, the different nodes are different employees. The edges, or connections between nodes, indicate who that employee looks up to for advice. Do you notice the hubs?

Their informal network graph clearly showed hubs, containing lots of nodes with incoming connections. They figured out that health and safety employee - who travels to all the sites, who is a jolly-good fellow well liked by all the frontline staff, who spends lots of time on the shop floor- is that red hub and has a disproportionate number of people saying they looked up to him for advice. He was clearly passing on his own views about the leaderships' actions. What do you suggest we do with him?

Why do we need to apply machine learning techniques on graphs at all to begin with?

One - Graphs are notoriously incomplete. We often lack critical information. Think of a bank with millions of bank accounts but with no clear knowledge of which accounts are use for fraudulent purposes. Deep learning techniques are extremely effective in such areas being able to do node classification tasks.

Two - Graph machine learning techniques are also quite helpful in edge classification tasks. Think of recommender systems, perhaps a very simple social network of friends. Something that looks like this below.

A simple social network of friends - The nodes represent people and the solid line edges represent friendships.

How likely is that A and B might end up being friends? Quite likely, isn't it? They are what are known as one-hop neighbours connected by a mutual friend C. Now how likely is it for C and G to become friends in the future? Not as likely given they are two hop neighbours, isn't it. This is a rather simple graph with 8 nodes and 8 edges. Now imagine a social network like Facebook with 3 billion active users (nodes) and over a Trillion Edges! Making friend recommendations on such a scale is no simple task and one that can only be done with machine learning techniques.

Three - It is often beneficial to identify communities and motifs (a dominant or recurring pattern) in networks. Consider this telecommunication network of people in Belgium. The nodes are people and the links are phone calls that they frequently make. The red-green scale represents the language they speak - Red are French speakers and Green are Dutch speakers. Do you notice a clear pattern? The language speaking sub-communities are clearly closely connected within while there are very weak links between these two sub-communities. The heterogeneous multi-language speakers represented in the middle and blown-up for clarity, are critical to the society. They are the ones who help in the integration of the different communities and drive emergence of consensus in issues of national importance. What happens if they are gone?

While identifying communities in this network structure is quite easy for the naked eye, more subtle network motifs in protein-protein interactions, in identifying allergens in drugs are massively helpful in biological sciences. For example, we know that cellular components associated with a specific disease (phenotype) show a tendency to cluster in the same network neighbourhood and Graph Neural Networks are ideally suited to identify these components in biological networks.

But how do Machine Learning Models learn the intricate topology behind graphs in the first place?

At the heart of machine learning models deployed on even the most complex applications is a concept called message passing. The idea is fairly simple - Tell me about your friends and I will tell you about you. Each node has neighbours and each neighbour has further neighbours. The node attributes are passed as the features of a Neural Net and node embeddings are then created for each node.

These node embeddings are then passed through a prediction head (node label, edge classification etc.) and inferences are made. A loss function is used to train the model - Standard stuff.

But that's not all - Lots of research and advances have been made on GNNs since Larry Page's page rank algorithm. Today we are able to leverage some of the advances from transformers (encoder - decoder) and attention models on GNNs. We are able to generate synthetic graphs to predict how artificial networks will evolve, to identify anomalies. We can analyse more complex graphs like bi-partite, tri-partite graphs. We are able create embeddings for entire sub-graphs, and not just for individual nodes so we can compare and contrast different network structures. We are able to build knowledge graphs, predict node properties and in several rather cool real world applications like author-author collaboration opportunities and paper-citation recommendations.

Back to the course, How was it?

The course is fairly intense, but I have walked away with a thorough understanding of how GNNs work, what they can be use for and being able to code a full scale ML model on graphs with Deepsnap and Pytorch Geometric. I must have spent hours and hours on the coding exercises trying to get higher accuracies on the GAT (attention network) and multi-head attention models - possibly the most challenging exercise in the program. I'd highly recommend it for anyone looking to scratch the surface a bit deeper.

On to the next one!

Vivien Sotiriou

Customer Experience| Employee Experience| Loyalty| Omnichannel| Retail, Auto, Media&Technology

1 年

Vignesh Ramesh that sounds amazing!! Would love to hear from you on this when I am back in London :)

Dennis Dofferhoff

Solutions Consultant & Engineering Pre-sales Lead | MarTech CX expert | Helping organisations to deliver world-class CX, happy customers ??

1 年

Great write up, congrats on the completion!

Fantastic! Love to see your passion for this topic shine through, Vignesh. Great first step on an incredible journey!

Lorraine Rough CCXP

Director, Customer Experience

1 年

congratulations Vignesh Ramesh - honestly where do you find the time!!!!

Senthil Sevugan

SaaS Solutions Consulting

1 年

THE man doing his thing

要查看或添加评论,请登录

Vignesh Ramesh的更多文章

  • My year in review

    My year in review

    Tomorrow, the 5th of April 2024, I turn a year older. While this post is obviously intended to get me as many birthday…

    11 条评论
  • The 5 questions I am asking myself this holiday season

    The 5 questions I am asking myself this holiday season

    About 8 years ago, when I passed out of B-school, I jotted down 5 questions on a piece of paper with gaps between them…

    6 条评论
  • The idiosyncrasies that keep us poor

    The idiosyncrasies that keep us poor

    Renowned British economist John Maynard Keynes introduced the idea of the ‘Beauty Contest’game likening it to investing…

    2 条评论
  • Equity Trading Strategy – Systematic Sector-Rotation

    Equity Trading Strategy – Systematic Sector-Rotation

    This piece was originally published on EveryFin on this link, right here Heuristic (rule-based) based equity trading…

    1 条评论
  • BCCI, IPL AND CRICKET IN INDIA – CAUSE IT’S A BITTERSWEET SYMPHONY

    BCCI, IPL AND CRICKET IN INDIA – CAUSE IT’S A BITTERSWEET SYMPHONY

    This is the story about how IPL and new-age cricket changed the vantage of one organisation once and forever. BCCI…

  • A FASCINATING DRAMA ON INVESTING – IN 3 SCENES

    A FASCINATING DRAMA ON INVESTING – IN 3 SCENES

    This post will be part of this Sunday's weekly newsletter (Issue #0) from Everyfin.in.

    4 条评论
  • EveryFin: What is the story about?

    EveryFin: What is the story about?

    This is an introduction post to EveryFin, my new venture. If you'd rather come back and read this later, please head…

    18 条评论

社区洞察

其他会员也浏览了