Implementing your own algorithms

In the previous tutorial we looked at the algorithms within the standard Raphtory library and the different ways we can apply them to a graph. While we have also written some simple custom functions, here we’re going to go over how to write a new multi-step algorithm. This example algorithm will calculate the degrees of separation from one a character to every other, defaulting to Gandalf.

As background: Six degrees of separation is “the idea that all people on average are six, or fewer, social connections away from each other.”

How analysis works in Raphtory

Raphtory’s analysis engine is based on vertex centric computation where algorithms are executed in bulk synchronous parallel supersteps. Within each superstep all vertices complete a function, within which they may execute any of the follow GAS operations:

  • Gather: Receive information sent from other nodes in the graph (normally neighbouring nodes).

  • Apply: Calculate some new state and store these values.

  • Scatter: Send this state to neighbours to be used in the next step.

While Raphtory does have APIs for global aggregation and histograms, individual vertices have limited knowledge of the graph and must converge to their final state over several supersteps. Each vertex knows:

  • Its own update history, property set and property history e.g. values and update times.

  • The history/properties of the edges attached to it - both incoming and outgoing.

  • Its own algorithmic state - this is a map of temporary values that can be set and modified during analysis.

Writing algorithms in this manner allows them to be run on both your laptop and a cluster without requiring any change. Let’s see how it looks to implement the proposed Degree Separation algorithm in this format.

To begin first import pyraphtory, create a local context and ingest the Lord of The Rings data into a graph.

[1]:
from pyraphtory.algorithm import PyAlgorithm
from pyraphtory.graph import TemporalGraph, Row, Table
from pyraphtory.vertex import Vertex
import csv
from pyraphtory.context import PyRaphtory
from pyraphtory.graph import Row

structure_file = "/tmp/lotr.csv"
ctx = PyRaphtory.local()

graph = ctx.new_graph()
with open(structure_file, 'r') as csvfile:
    datareader = csv.reader(csvfile)
    for row in datareader:
        source_node = row[0]
        destination_node = row[1]
        timestamp = int(row[2])
        graph.add_vertex(timestamp, source_node, vertex_type="Character")
        graph.add_vertex(timestamp, destination_node, vertex_type="Character")
        graph.add_edge(timestamp, source_node, destination_node, edge_type="Character_Co-occurence")

Algorithm Structure

Now that we have the graph set up we can create a new algorithm.

Algorithms are classes which extend PyAlgorithm. This class has two functions to implement, __call__() and tabularise():

  • __call__ is where we will be doing the majority of our computation. This takes a TemporalGraph and applies a set of operations to it (the functions we will define) and returns a new TemporalGraph.

  • tabularise is where we will choose what data on our vertices to return. This takes a TemporalGraph and returns a Table which can either be turned into a dataframe or output to a Sink.

In this tutorial we will implement both as we want to compute a new property and output the results, but in many instances we may only need one or the other. For instance, we chained multiple algorithms together in the last tutorial by only applying the __call__ functions of each algorithm. No tabularise was then required as we finished off with our own call to select (returning a Table).

__call__()

There are many functions which many be called on the TemporalGraph to build algorithms as complex as you like, these can be found in the PythonDocs on the left. The main two of these are step() and iterate().

  • step takes a function which will execute on all vertices once. This is normally to set up an algorithm or called in succession when you have a known number of steps to complete an algorithm.

  • iterate takes a function which will be applied on vertices until some convergence state is met. This is normally used for the bulk of an algorithm when the number of required steps is unknown.

    • This function also requires the user to set a maximum number of steps (to bound the computation) and to decide if all vertices need to execute every superstep or if only messaged vertices are required to, minimising wasted cycles.

tabularise()

As tabularise receives a TemporalGraph we can continue calling the same functions here, however, be aware that these won’t be applied if you use this algorithm in a chain or run your own select. Once you have called select, as we have seen before, we can apply several Table steps, such as filter, before returning our data in the desired format.

Degree Separation from 10,000ft

For our algorithm we want to take these building blocks and set it up such that we start at Gandalf who will tell all of his neighbours they are 1 hop away. These nodes should then message all their neighbours telling them they are 2 hops away, and so on until no more nodes can be reached. Importantly, nodes should only update their state once, as we want the lowest number of hops.

Lets take a look at an example implementation of this algorithm, we can then go through what is happening step by step.

[3]:
state_name = 'separation' #name the state we are going to be creating on each vertex

class DegreeSeparation(PyAlgorithm): #our new class extends PyAlgorithm

    #create our __call__ function which takes the TemporalGraph and a name (defaulting to Gandalf)
    def __call__(self, graph: TemporalGraph,name = "Gandalf") -> TemporalGraph:

        #create a setup step function
        def step(v: Vertex):
            #find our starting node - the one with the correct name
            if v.get_property_or_else("name","") == name:
                #Set the `separation` of this node to 0 and send this value to all its neighbours
                v[state_name] = 0
                v.message_all_neighbours(0)
            else:
                #if this isn't the starting node, set its separation to -1, which we can imagine means 'unreached'
                v[state_name] = -1

        #create our iteration function
        def iterate(v: Vertex):
            #if the node hasn't been reached before (current state is still -1)
            if v[state_name] == -1:
                #take the minimum value of the messages received and add 1
                sep_state   = min(v.message_queue()) + 1
                #set its state to the new separation value
                v[state_name] = sep_state
                #and tell its neighbours
                v.message_all_neighbours(sep_state)
            #if the node has been reached before we can ignore the messages and vote to halt
            else:
                v.vote_to_halt()

        #finally we apply these two functions to the graph
        #for iterate we set the maximum supersteps to 6 and
        # execute_messaged_only to True as nodes which receive no new information don't need to do anthing.
        return graph.step(step).iterate(iterate, iterations=6,execute_messaged_only=True)

    #define our tabluarise function which calls select on the graph and returns the vertex name and its separation value
    def tabularise(self, graph: TemporalGraph) -> Table:
        return graph.select(lambda v: Row(v.name(), v[state_name]))

#execute our new algorithm on the graph and return a dataframe
graph \
.execute(DegreeSeparation()) \
.to_df(["name", state_name])

[3]:
timestamp name separation
0 32674 Hirgon 1
1 32674 Hador 3
2 32674 Horn 2
3 32674 Galadriel 1
4 32674 Isildur 1
... ... ... ...
134 32674 Faramir 1
135 32674 Bain -1
136 32674 Walda 2
137 32674 Thranduil 2
138 32674 Boromir 1

139 rows × 3 columns

Digging in the steps

As you can see from the output we now have a separation value for all vertices in the graph. Most of these scores are quite low with Gandalf being a central character, but we still have a couple with -1 who were unreachable. Lets dig into how we calculated these scores.

Step

To begin we have one setup step in which we:

  • First check if a vertex is our starting point, where in the default case we are looking for Gandalf.

  • Once this node is found we:

    • Create a new state on the vertex representing the separation score and set it to 0 - i.e. no separation

    • Message all of the node’s neighbours to tell them they are directly connected.

  • If a node is not the starting point their state is initialised to -1. We can read this as currently unreachable.

Iterate

Next we implemented the bulk of the algorithm inside of a singular iterate function. Within this we:

  • Check if the vertex has been reached before (its state is still -1).

  • If it hasn’t we take the minimum value it has been messaged by its neighbours and increment this by one.

  • This is its separation from the starting node which it can then tell to its neighbours.

  • If it has already been contacted it ignores all messages and votes to halt meaning the node is happy with its current state and it thinks the algorithm has converged.

This function is then set to execute only on vertices that have been sent a message (execute_messaged_only=True) and runs up to 6 times (iterations = 6) for our six degrees of separation).

The Return of The King

Following the definition of __call__() we implemented a tabularise() which goes through all vertices and extracts the node name and the final degree of separation.

You could additionally add a filter here to ignore any nodes with a score of -1. This would exclude nodes that are not at all connected to Gandalf or whose shortest path to Gandalf is longer than 6 hops

What is next?

You should now be equipped with all the tools you need to do some very cool graph analysis with Raphtory. Your next stop should be the Python docs on the left where you can dig into the variety of functionality available on the Graph, Vertex and Table. We also have several example projects with some interesting algorithms which will be a good inspiration!

If you have any issues or just want to tell us about your awesome new project, come and join us on Slack where there are always many members of the Raphtory team hanging out.