Node and Graph embeddings

  1. Node Level : Node Embedding
  2. Graph Level : Graph Embedding

Node Embedding

  • Associate a vector to each node
  • This vector is used as input for a machine learning model
  • Depends on the graph structure
  • Encodes similarity between nodes
  • Equivariant Function

Feature Engineering

  • Determines node's features from the domain (experts)
  • Compute node's features from the graph structure
    • Degree
    • Centrality
    • Betweenness
    • Closeness
    • DeepWalk, node2vec, ...
    • Spectral embedding

Node Degree

Simply taking the degree of a node as a feature.

  • Indicates how important is an user in a social network (influencer)
  • Only very local information
  • Too Limited

Node Centrality

Estimate the position of a node within the graph

  • Closeness centrality
  • Betweenness centrality
  • Eigenvector centrality

Closeness Centrality

Measures how central a node is by calculating the reciprocal of the sum of the shortest path lengths from that node to all other nodes.

  • In disconnected graphs, traditional closeness centrality isn't applicable. Alternative measures, like harmonic centrality, are used to handle infinite distances.
  • Closeness centrality assumes information spreads along shortest paths, which may not always reflect real-world scenarios.

Betweenness Centrality

Quantifies the importance of a node within a network by measuring the number of shortest paths between pairs of nodes that pass through it. Nodes with high betweenness are critical connectors within the network.

  • : Total number of shortest paths from to .
  • : Number of those paths passing through .

Eigenvector Centrality

Measures a node's influence within a network by considering not only the number of its direct connections but also the importance of the nodes it connects to.

For a graph with adjacency matrix , the centrality score for node is:

  • is the largest eigenvalue of .

  • represents the connection between nodes and .

  • Related to the PageRank algorithm

Spectral embedding

Laplacian EigenMaps

Use the eigenvectors of the Laplacian matrix as feature vectors

  • is an embedding of

Example on KarateClub

center fit

Learn node embeddings

Learn representations instead of defining them explicitly

Shallow Encoding

Find a vectorial representation and such as approximate the similarity between and .

Node Feature Matrix

  • : vectorial representation of node

Similarity Node Matrix

  • : similarity between and . Should be approximated by

  • How to define ?

  • How to compute ?

DeepWalk

  • encodes the similarity of random walks distribution
  • Unsupervised learning
  • Use Random Walks to explore the graph
  • Transform node sequence to vectors using Word2Vec like approach

DeepWalk : Random Walks

  • For each node, compute random walks
  • Nodes composing a random walk constitute a sequence

Random Walk computation

  • Transition Matrix
  • : Probability of random walks of length between two nodes
  • : sum probabilities for different size

DeepWalk : Word2Vec for graphs

Learning node representations based on the similarity of their random walk distribution.

  • Node sequences are sentences (as in NLP)
  • Adaptation of Skip Gram model from word2vec to learn a node embedding.
  • Nodes sharing random walks in the graphs are associated to similar vectors.

Node2vec

  • Improvement over DeepWalk.
  • Two new parameters to drive the random walks (BFS vs DFS)
  • Allow to adapt the exploration to the graph

Node2vec

Tradeoff between local and global exploration

  • p : probability to come back to previous node
  • q :Favorize BFS or DFS

center

Example on KarateClub

center fit

Conclusion on node embeddings

  • Useful for node level tasks
  • Embedding can be defined a priori or learnt using the structure in a unsupervised way
  • Widely used for network analysis
  • Learning embeddings according to a downstream task : Next lesson !

Graph Embedding

  • Associate a vector to the whole graph
  • Permutation invriant function
    center fit

Simple Approach

Reach the invariance by an invariant aggregation function (sum, mean, etc)

Graph representation is a resume of the set of node representations.

Depends on the quality of node embeddings

Application Dependant Approach

Use a set of predefined graph features.

  • Require an expert domain
  • Sub Structures counting (more on that next)
  • Fingerprints for molecular compounds (RDkit, OpenBabel)
  • Molecular Descriptors (Avoid the graph problem)

center

Empirical Map

Consider that you are able to compute a distance between graphs (ged).

In your learning dataset, identify anchors to .

  • Depends on the quality (and number of anchors)
  • Versatile approach

Topological embedding

Local similarities induce global similarity

  • Define a set of substrutures
  • Compute the number of occurences of each substructure in the graph
  • Encode label information trough histograms [Sidere et al. 09]
  • Considering more detailed labeled information may lead to huge vectors sizes
  • Related to fingerprints and bags of patterns kernels

Global Approaches

Fuzzy multilevel graph embedding [Luqman et al.13]

Add basic global information like the number of nodes or edges in the graph.

Spectral embedding [Luo et al. 03, Caelli et Kosinov 04, Luo et al. 06]

Define an embedding from the spectrum of laplacian or adjacency matrix

Matrix Factorization

Find the best embedding to recreate a distance matrix

  1. Similarity matrix such as
  2. Factorize to find the embedding : (⚠️ Works only if d is an euclidean distance)
  3. encodes the embedding of the graphs

Graph Kernels

All seen embeddings approaches require to define explicitly the vector.

  • Limits the expressivity
  • The bigger the size, the more complex the computations

Use the kernel trick on graphs.

Kernel Theory

Kernel

  • Symmetric:
  • For any dataset , the Gram matrix is defined by
  • is positive semi-definite:

Reproducing Kernel Hilbert Spaces

  • Scalar product Kernel:

  • Distance kernel:

  • A kernel is a similarity measure between objects.

Kernel Trick with SVM

Objective function

s.t

Decision function

Kernel Trick Example

Example

  • No linear solution.
  • Solution in

Graph Kernels

Graph Kernel Theory

  • Graph kernel
  • Similarity measure between graphs
    • Use of natural representation of data (molecules)
    • Implicit graph embedding in by
    • Access to kernel methods (SVM, KRR, etc.)

center fit

Kernel Definition

Graph kernel based on bags of patterns

  1. Extraction of a set of patterns
  2. Comparison between patterns
  3. Comparison between bags of patterns

Walks and Paths

  • Patterns are walks or paths in the graph
  • Walk: Sequence of nodes such as and
  • Path: Walk without repeated nodes

Tottering walk: Walk with a repeated node

center

Graphlets

Set of subgraphs having at most 5 nodes

center fit

Treelets

Set of labeled sub-trees having at most 6 nodes

  • Encode structural information
  • Labeled patterns
  • Limited set of structures

center

Kernel Definition

Counting function

: Number of occurrences of treelet in

Kernel definition

  • : Set of patterns extracted from
  • : Similarity according to

Graph similarity Similarity of their bags of patterns

Pattern Selection

  • Different properties Different causality.
  • Domain Expert Approach: Selection of relevant descriptors for a given task
  • Machine Learning Approach: Define the set of substructures according to an objective

Multiple Kernel Learning

Optimal combination of kernels:

or

  • : Set of treelets identified in the learning set
  • : Measure of contribution of
  • Each sub-kernel is weighted
  • Selection of relevant treelets
  • Weighting computed according to a given property

Weisfeler-Lehman Kernel

Principle : Aggregate neighboorhood information in an iterative fashion

  • Related to WL Isomorphism Test

Question : How to derive an inexact similirity measure from this isomorphism algorithm ?

Weisfeiler-Lehman Kernel

  • : Similarity between graphs and at iteration
  • Count the number of common labels at iteration

Weisfeiler-Lehman Kernel

center fit

Weisfeiler-Lehman Kernel

  • Efficient to compute
  • Enrich the node description iteratively
  • Widely used in many applications
  • Fundations of modern Graph Neural Networks (next lesson)

Conclusion

  • Node and Graph embeddings are essential for graph analysis
  • Node embeddings are used for node level tasks
  • Graph embeddings are used for graph level tasks
  • Many methods exist to define embeddings
  • Kernel methods are a powerful tool to define embeddings
  • All embeddings are limited by the choices made

image : W. Hamilton, Graph Representation Learning> <!-- dependant du graphe en lui meme slides leskovec 3 : 8 - 12 node features : degree, centrality, betweeness, closeness, ...

page 11 hamilton

https://karateclub.readthedocs.io/en/latest/_modules/karateclub/node_embedding/neighbourhood/node2vec.html https://karateclub.readthedocs.io/en/latest/modules/root.html

Duvenaud et al., 2016

from Graph Classification and Clustering Based on Vector Space Embedding, Kaspar Riesen

Toutefois, dans le cas où la matrice de dissimilarité D ne respecte pas la cinquième propriété d’une métrique euclidienne (section 1.3.4), la matrice S n’est pas obligatoirement semi-définie positive et il convient alors de la régulariser. Cette étape de régularisation altère la mesure de similarité puisque la matrice encodant la mesure de dissimilarité est modifiée.

http://proceedings.mlr.press/v5/shervashidze09a/shervashidze09a.pdf