An Introduction to Graph Networks – my notes: Part 1 of X
Ajay Taneja
Senior Data Engineer | Generative AI Engineer at Jaguar Land Rover | Ex - Rolls-Royce | Data Engineering, Data Science, Finite Element Methods Development, Stress Analysis, Fatigue and Fracture Mechanics
1. Introduction
?
I had announced in a LinkedIn post some days ago [https://www.dhirubhai.net/posts/ajay-taneja-47727817_machinelearning-deeplearning-graphnetworks-activity-7132788256881917952-ncXP?utm_source=share&utm_medium=member_desktop ] that I will be publishing my notes on Graph networks as a series of articles/blogs. This is my first blog in the series and here I talk of some fundamental aspects of Graph Networks. This article is organized as follows:
?
?
?
?
2. What are Graphs and why Graphs?
?
Graphs are a general language for describing and analysing entities with relationships and interactions. This means that rather than thinking of a domain as specific data points we can think of it in terms of networks and relationships between the entities. This means that there is an underlying graph of relation between the entities. The structure of the graph will denote the relationship between the entity’s as shown in the figure below.
2. Data as Graphs: Examples
?
There are many types of data that can be represented as graphs and modelling the graphical relations allows us to build accurate models of the underlying phenomenon. Some of the examples of graph networks include:
Stated below is a concise explanation of the above graph networks.
Event Graphs:
The figure below is an example of an event graph where the circles indicate individual events, the solid arrows are causal links extracted from the game, and the dashed arrows are player-links derived from the flock of cameras.
Disease Pathways:
This is an example of a Protein-Protein interaction wherein a system of interacting proteins collectively produces some disease. In the figure below, proteins associated with a disease are projected onto the protein-protein interaction (PPI) network. Disease pathway is then a subgraph of the PPI network defined by the set of disease-associated proteins.
Food webs:
?
The figure shows a simple food web for a community of seven species: robin, fox, grasshopper, raccoon, salamander, milksnake, and toad. It may be noticed that, here, that we can de?ne competition using the food web. Two species compete if and only if they have a common prey. Thus, in the example of the figure below, raccoon and fox compete (since robin is a common prey), milksnake and raccoon compete, while salamander and robin do not compete. We use this competition relation to de?ne a graph called the competition graph.
?Particle networks [Ref: https://commons.wikimedia.org/wiki/File:Elementary_particle_interactions.svg ] :
The diagram below summarizes the tree-level interactions between elementary particles described in the Standard Model. Vertices (darkened circles) represent types of particles, and edges (blue arcs) connecting them represent interactions that can take place.
The organization of the diagram is as follows: the top row of vertices (leptons and quarks) are the matter particles; the second row of vertices (photon, W/Z, gluons) are the force mediating particles; and the bottom row is the Higgs boson.
Underground Network:
The London Underground Tube Map is also a graph. The stations are the vertices and the train lines joining them are the edges. Interchanges are shown where different lines connect.
Social Network Graph
Model of a social network describes a graph with several types of nodes and edges with respect to each activity on the online platform. According to each model, many sub graphs can be extracted and analyzed in order to predict behavioural patterns of the users within the network.
The figure below shows a typical graph of a social media platform as LinkedIn or Facebook or X (previously, Twitter).
Citation network graph:
Here is a nice interactive example of a citation network graph: https://signsat40.signsjournal.org/cocitation/interactive/
?
Network of neurons:
This is graph showing how neurons in human brain are connected.
Knowledge Graphs:
And then, we can take knowledge and represent facts as relationship between different entities. These are called as Knowledge Graphs. ?Knowledge graphs, as shown in Figure, are networks of entities, their semantic types, properties and relationships (Nickel et al. 2016).
Software Code:
4. Graphs for Machine and Deep Learning: Image vs text vs graph processing for ML/DL
?
With the several examples described above, the question to be answered is how we take advantage of the relational structure inherent in a graph for better and accurate prediction. This is because explicit modelling of relationship can help us achieve better and more accurate predictions.
This leads us to the next topic of discussion – let us understand why graph networks are difficult to process for Machine and Deep Learning.?
?
Why are graph networks difficult to process for Deep Learning?
?
It should be underscored that compared to images or text; graphs are relatively difficult to process for deep learning. This is because graph networks have arbitrary size and complex topology as shown in the figure below.
Contrary to images which are represented as fixed grids or text which are sequences represented as higher dimension word embedding – in case of graphs there is no spatial locality. It may be recalled that in case of text, we have a reference point – thinking of the state-of-the-art transformers - we have the concept of positional encoding which allows us to model the order of the words [https://www.dhirubhai.net/pulse/deep-dive-positional-encodings-transformer-neural-network-ajay-taneja/?trackingId=mSLcJxc7TlOj18qu3bBvEQ%3D%3D ]
?
What is the Deep Learning process in Graphs?
Deep Learning process in graphs will involve taking the entire graph network, passing it through the “designed” neural network architecture that allows us to do this end-to-end without any human feature engineering.
For the Deep Learning process in graphs, each node will be represented as “d” dimensional embedding such that similar nodes in the network are embedded close together in the embedding space. depicting a feature representation of a node or an embedding of a link or an embedding of an entire graph.
Methods for Machine Learning and Representation Learning (Representation Learning implies the automatic extraction of features from the graph without any human feature engineering) for Graph Structural Data:
?
I will be talking of all of the above methods in my subsequent blogs in this series.
5. Components of a Graph and Design Choices:
?
Next, let us see what the components of a graph are or a network – the network is composed of two types of objects; first we have objects or entities known as nodes or vertices and then we have interactions between these nodes which are represented as links or edges – the entire domain is called a network or a graph.
?
It should be understood that the choice of a proper network representation of a given domain/problem determines our ability to use the network successfully. In some cases, there is a unique, unambiguous representation but in other cases the representation is by no means unique. The way you assess the links will determine the nature of the problem you study.
Design Choices:
Let us now see what the design choices are we’re faced with whilst creating graphs from data. Firstly, let us define Undirected Graphs and Directed Graphs
Undirected vs Directed Graphs:
Undirected graphs are graphs used for modelling symmetric or reciprocal relationships e.g., collaboration, friendship, interaction between proteins, etc.
领英推荐
Contrary, to undirected graphs, directed graphs are captured by directed links where every link has a source and a destination Examples include phone calls, financial transactions, following on LinkedIn / Instagram. This is denoted by an arrow as shown in the figure below.
Undirected Graphs: Node Degree
Node Degree denotes the number of edges adjacent to a given node. For example, the node A in the figure below has degree 4 as it’s got 4 edges connected to it.
Undirected Graphs: Average Node Degree:
Average Node Degree simply denotes the average degree of all the nodes in the network. Mathematically, the average node degree is represented as:
It may be noticed that mathematically, the average node degree will work out to twice the number of edges in the network divided by the number of nodes in the network- one can understand here that when we’re computing the node degree each edge gets counted twice because every edge has two end points.
The above is valid for “Undirected Graphs”.
Directed Graphs: Node Degree
In Directed Graphs, we have the in-degree and out-degree. In-degree is the number of edges pointing towards the node whereas the out-degree is the number of edges pointing outward.
It may be noticed from the figure below, that the node C has an in-degree of 2 and out-degree of 1.
6. Bipartite Graphs and Folded Bipartite Graphs
?
Bipartite Graph [Bipartite ~ 2 partitions]:
Another very popular type of graph structure that is used quite often is called “Bipartite Graph”. Bipartite Graph is a graph where the nodes can be divided into two disjoin sets U and V such every link connects to a node in U and to one node in V such U and V are independent sets,
Examples of Bipartite Graph include:
?
Folded Bipartite Graph:
Having discussed about the Bipartite Graph, it is worthwhile talking about Folded Bipartite Graph – considering an example where we have Bipartite Graph as shown below which denotes scientific authors (1, 2, 3, 4, 5, 6) linked to the publications (A, B, C, D).
?
Now, if we just take the vertices of the scientific authors and connect these vertices if they have co-authored certain publication as shown in the network below.
Then this type of graph is called Folded or Projected Bipartite Graph. As shown in the figure above, since 1,2,3 have co-authored Publication A, they are connected. Similarly, since nodes 2 and 5 have co-authored publications 4, 5, 6, 7 they are connected as shown above.
?
Analogous, to the above we can create a folded network corresponding to the publications A, B, C and D and we will obtain a folded network as shown in the figure below.
It can be seen from the figure above, that, since the publications B, C, D have been co-authored by 5, they are connected and since publications B and A are co-authored by 2, they are connected.
?
7.How do we represent graphs numerically?
?
Adjacency Matrix for Undirected Graphs:
Representing graphs numerically is again an interesting problem. One way to represent graphs numerically is through an “Adjacency Matrix”. For example, in case of a network s shown in the figure below which ahs 4 nodes, the Adjacency matrix will be a square matrix of size 4 x 4 and will have entries 0 or 1. The element ij of the matrix will have an entry 1 if the nodes I and j are connected and an entry 0 if the nodes I and j are not connected.
With the above rule, the Adjacency matrix of a 4 noded network will be represented as below:
It should be noted – the Adjacency Matrix of an Undirected Graph will naturally be symmetric – as implied through the definition above.
?
Adjacency Matrix of a Directed Graph:
If the graph is Directed, then, its Adjacency Matrix will not be symmetric. This is because if the node I is connected to j, then, the element Aij will be equal to 1 but since the node j is not connected to I, the element Aji will be 0.
Node Degree in terms of Adjacency Matrix:
I have talked about the node degree in section 5. Having obtained the Adjacency matrix, it is possible express the node degree in terms of Adjacency Matix.
For an undirected graph,
For a directed graph,
Adjacency Matrices are Sparse Because Networks are Sparse Graphs:
?
One important consequence of real-world graphs is that they are extremely sparse. Thus, if we look at the Adjacency Matrix of a real-world network, it is extremely sparse – that is: many vertices/nodes are not connected and thus the Adjacency matrix has many 0’s.
One can imagine that – human social network, citation network is extremely sparse [it is not possible for everybody to be connected to everybody or all authors have all publications!]. Thus, very rarely the Adjacency matrix will be a dense matrix.
?
Representing Graph as an Edge List:
There are 2 other ways to represent a graph numerically – one is to represent graph as an edge list. This representation is also very popular in Deep Learning frameworks.
For a network shown in the figure below, the graph can be represented as a list of edges as illustrated below.
However, unlike the Adjacency matrix, it is not trivial to do any kind of analysis of a graph through this type of representation because even computing a degree of a given node is not trivial.
?
Representing Graph as an Adjacency List:
Another common representation of a graph is Adjacency List. Adjacency List is easier to work if the network is sparse and large and allows us to quickly retrieve all neighbours of a given node.
For example, for the network shown, its Adjacency List is shown to the right in the figure:
?8 .Nodes and Edge Attributes:
?
A last important thing that is noteworthy is that graphs can have attributes attached to them. Nodes, edges as well s the entire graph can have attributes/properties attached to them.
Example:
?
Edges can have different properties – if it’s a phone call- its duration.
Nodes can have attributes such as: age, gender, location depending upon the entity under consideration.
?
Representing properties within the Adjacency Matrix:
It is also possible to represent such as weights within the Adjacency Matrix as shown in the figure below – here the weight represents the strength of the connection.
Multi-graphs
Sometimes one may want to represent a graph as a multi-graph as illustrated in the figure below where there is more than one connection between nodes. This can be common where the entities under consideration have more than one transaction involved.