登录查看更多内容

What I learnt doing Stanford's course on Machine Learning with Graphs

Vignesh Ramesh

Machine Learning Solutions Engineer @ Snorkel AI

发布日期: 2023年8月29日

A quick intro to Network Science

Graph theory and network science originated as far back as the 18th century in K?nigsberg, eastern Prussia, a thriving city of merchants and ships. The city built 7 bridges across the river Pregel, like so.

This particular network of land (nodes) and bridges (edges) gave birth to the popular puzzle called the problem of K?nigsberg. Can someone walk across all 7 bridges without ever crossing any one bridge twice? The famous Swiss mathematician Euler proved that there is no solution to this problem. His elegant work on this matter is considered to be the first theorem of graph theory and first proof in network science.

We have since come a very long way from there. Network science is today applied in a wide variety of sophisticated use-cases. We use graph machine learning techniques to identify fraudulent transactions, to look at protein-protein interactions and discover potential side-effects of new drugs, and to get friend recommendations in Facebook and product recommendations in Amazon.

Google's page ranking algorithm is entirely grounded on graph theory, this massive network of webpages (nodes) in the world wide web that connect to each other through hyperlinks (edges).

The intuition behind google's page rank algorithm is this - The page rank is a weighted sum of the ranks of all the web pages that link into it. Page B in this case has plenty of in-links and hence its page rank is high.

Network science is applied in biology to understand the human disease network by connecting genes whose mutations are known to cause diseases. It is applied in electricity transmission networks to understand load and failure patterns, to uncover vulnerabilities in the network.

But perhaps the quirkiest application of graph theory comes from an internal organisational structure analysis a Hungarian firm did. While the internal hierarchical structure is often used to pass down comms, it is often true that informal networks play a huge impact in employee perception. An accurate graph of such informal networks can be quite useful to identify silos between functions, and equally help identify powerful nodes. This Hungarian firm realised that the perception the frontline staff had absolutely nothing to do with the intentions of the senior management. So they conducted a study asking the employees one simple question - Who do you look up to for advice on work related matters when you need it?

In this graph, the different nodes are different employees. The edges, or connections between nodes, indicate who that employee looks up to for advice. Do you notice the hubs?

Their informal network graph clearly showed hubs, containing lots of nodes with incoming connections. They figured out that health and safety employee - who travels to all the sites, who is a jolly-good fellow well liked by all the frontline staff, who spends lots of time on the shop floor- is that red hub and has a disproportionate number of people saying they looked up to him for advice. He was clearly passing on his own views about the leaderships' actions. What do you suggest we do with him?

Why do we need to apply machine learning techniques on graphs at all to begin with?

One - Graphs are notoriously incomplete. We often lack critical information. Think of a bank with millions of bank accounts but with no clear knowledge of which accounts are use for fraudulent purposes. Deep learning techniques are extremely effective in such areas being able to do node classification tasks.

领英推荐

June 2024

Amazon Science 8 个月前

January Is for Challenging Yourself to Learn New Skills

Towards Data Science 1 年前

Find out the answer to the question: Which Machine…

Sankhyana Consultancy Services Pvt. Ltd. 1 年前

Two - Graph machine learning techniques are also quite helpful in edge classification tasks. Think of recommender systems, perhaps a very simple social network of friends. Something that looks like this below.

A simple social network of friends - The nodes represent people and the solid line edges represent friendships.

How likely is that A and B might end up being friends? Quite likely, isn't it? They are what are known as one-hop neighbours connected by a mutual friend C. Now how likely is it for C and G to become friends in the future? Not as likely given they are two hop neighbours, isn't it. This is a rather simple graph with 8 nodes and 8 edges. Now imagine a social network like Facebook with 3 billion active users (nodes) and over a Trillion Edges! Making friend recommendations on such a scale is no simple task and one that can only be done with machine learning techniques.

Three - It is often beneficial to identify communities and motifs (a dominant or recurring pattern) in networks. Consider this telecommunication network of people in Belgium. The nodes are people and the links are phone calls that they frequently make. The red-green scale represents the language they speak - Red are French speakers and Green are Dutch speakers. Do you notice a clear pattern? The language speaking sub-communities are clearly closely connected within while there are very weak links between these two sub-communities. The heterogeneous multi-language speakers represented in the middle and blown-up for clarity, are critical to the society. They are the ones who help in the integration of the different communities and drive emergence of consensus in issues of national importance. What happens if they are gone?

While identifying communities in this network structure is quite easy for the naked eye, more subtle network motifs in protein-protein interactions, in identifying allergens in drugs are massively helpful in biological sciences. For example, we know that cellular components associated with a specific disease (phenotype) show a tendency to cluster in the same network neighbourhood and Graph Neural Networks are ideally suited to identify these components in biological networks.

But how do Machine Learning Models learn the intricate topology behind graphs in the first place?

At the heart of machine learning models deployed on even the most complex applications is a concept called message passing. The idea is fairly simple - Tell me about your friends and I will tell you about you. Each node has neighbours and each neighbour has further neighbours. The node attributes are passed as the features of a Neural Net and node embeddings are then created for each node.

These node embeddings are then passed through a prediction head (node label, edge classification etc.) and inferences are made. A loss function is used to train the model - Standard stuff.

But that's not all - Lots of research and advances have been made on GNNs since Larry Page's page rank algorithm. Today we are able to leverage some of the advances from transformers (encoder - decoder) and attention models on GNNs. We are able to generate synthetic graphs to predict how artificial networks will evolve, to identify anomalies. We can analyse more complex graphs like bi-partite, tri-partite graphs. We are able create embeddings for entire sub-graphs, and not just for individual nodes so we can compare and contrast different network structures. We are able to build knowledge graphs, predict node properties and in several rather cool real world applications like author-author collaboration opportunities and paper-citation recommendations.

Back to the course, How was it?

The course is fairly intense, but I have walked away with a thorough understanding of how GNNs work, what they can be use for and being able to code a full scale ML model on graphs with Deepsnap and Pytorch Geometric. I must have spent hours and hours on the coding exercises trying to get higher accuracies on the GAT (attention network) and multi-head attention models - possibly the most challenging exercise in the program. I'd highly recommend it for anyone looking to scratch the surface a bit deeper.

On to the next one!

Vivien Sotiriou

Customer Experience| Employee Experience| Loyalty| Omnichannel| Retail, Auto, Media&Technology

1 年

Vignesh Ramesh that sounds amazing!! Would love to hear from you on this when I am back in London :)

1 次回应

Dennis Dofferhoff

Solutions Consultant & Engineering Pre-sales Lead | MarTech CX expert | Helping organisations to deliver world-class CX, happy customers ??

1 年

Great write up, congrats on the completion!

1 次回应

Ashley Tomic

1 年

Fantastic! Love to see your passion for this topic shine through, Vignesh. Great first step on an incredible journey!

1 次回应

Lorraine Rough CCXP

Director, Customer Experience

1 年

congratulations Vignesh Ramesh - honestly where do you find the time!!!!

1 次回应

Senthil Sevugan

SaaS Solutions Consulting

1 年

THE man doing his thing

3 次回应

查看更多评论

要查看或添加评论，请登录

Vignesh Ramesh的更多文章

My year in review

2024年4月4日

My year in review

Tomorrow, the 5th of April 2024, I turn a year older. While this post is obviously intended to get me as many birthday…

11 条评论
The 5 questions I am asking myself this holiday season

2022年12月23日

The 5 questions I am asking myself this holiday season

About 8 years ago, when I passed out of B-school, I jotted down 5 questions on a piece of paper with gaps between them…

6 条评论
The idiosyncrasies that keep us poor

2021年9月23日

The idiosyncrasies that keep us poor

Renowned British economist John Maynard Keynes introduced the idea of the ‘Beauty Contest’game likening it to investing…

2 条评论
Equity Trading Strategy – Systematic Sector-Rotation

2021年3月26日

Equity Trading Strategy – Systematic Sector-Rotation

This piece was originally published on EveryFin on this link, right here Heuristic (rule-based) based equity trading…

1 条评论
BCCI, IPL AND CRICKET IN INDIA – CAUSE IT’S A BITTERSWEET SYMPHONY

2020年9月23日

BCCI, IPL AND CRICKET IN INDIA – CAUSE IT’S A BITTERSWEET SYMPHONY

This is the story about how IPL and new-age cricket changed the vantage of one organisation once and forever. BCCI…
A FASCINATING DRAMA ON INVESTING – IN 3 SCENES

2020年9月18日

A FASCINATING DRAMA ON INVESTING – IN 3 SCENES

This post will be part of this Sunday's weekly newsletter (Issue #0) from Everyfin.in.

4 条评论
EveryFin: What is the story about?

2020年9月2日

EveryFin: What is the story about?

This is an introduction post to EveryFin, my new venture. If you'd rather come back and read this later, please head…

18 条评论

See all articles

What I learnt doing Stanford's course on Machine Learning with Graphs

Vignesh Ramesh

Machine Learning Solutions Engineer @ Snorkel AI

A quick intro to Network Science

Why do we need to apply machine learning techniques on graphs at all to begin with?

领英推荐

But how do Machine Learning Models learn the intricate topology behind graphs in the first place?

Back to the course, How was it?

Vignesh Ramesh的更多文章

社区洞察

其他会员也浏览了

Ancient History of Artificial Intelligence

What Are The Theoretical & Practical Limitations of Current Machine Learning Algorithms?

Major software libraries for physics-informed machine learning

How are Jacobian and Hessian matrices used in machine learning?

Issue #203 - THE ML ENGINEER ??

Architectural Resiliency of GNN Algorithms to Graph Structure

Artificial Intelligence #13: An easy maths-based strategy to understand machine learning and deep learning

FunSearch: Leveraging AI Hallucinations to Make New Discoveries in Mathematics

A Great Tree of Artificial Intelligence has Fallen: Douglas Bruce Lenat

Comparing Machine Learning Models to Find the Best Fit

A quick intro to Network Science

Why do we need to apply machine learning techniques on graphs at all to begin with?

领英推荐

But how do Machine Learning Models learn the intricate topology behind graphs in the first place?

Back to the course, How was it?

Vignesh Ramesh的更多文章

My year in review

The 5 questions I am asking myself this holiday season

The idiosyncrasies that keep us poor

Equity Trading Strategy – Systematic Sector-Rotation

BCCI, IPL AND CRICKET IN INDIA – CAUSE IT’S A BITTERSWEET SYMPHONY

A FASCINATING DRAMA ON INVESTING – IN 3 SCENES

EveryFin: What is the story about?

社区洞察

其他会员也浏览了

Ancient History of Artificial Intelligence

What Are The Theoretical & Practical Limitations of Current Machine Learning Algorithms?

Major software libraries for physics-informed machine learning

How are Jacobian and Hessian matrices used in machine learning?

Issue #203 - THE ML ENGINEER ??

Architectural Resiliency of GNN Algorithms to Graph Structure

Artificial Intelligence #13: An easy maths-based strategy to understand machine learning and deep learning

FunSearch: Leveraging AI Hallucinations to Make New Discoveries in Mathematics

A Great Tree of Artificial Intelligence has Fallen: Douglas Bruce Lenat

Comparing Machine Learning Models to Find the Best Fit