Installing PyRaphtory

PyRaphtory can be easily installed via pip. This will pull all of the background dependencies for Raphtory, automatically setting up any system paths to point at the correct location. Our only requirement here is you running python version 3.9.13.

Install

pip install requests pandas pemja cloudpickle parsy
pip install -i https://test.pypi.org/simple/ pyraphtory_jvm==0.2.0a7
pip install -i https://test.pypi.org/simple/ pyraphtory==0.2.0a7

Running PyRaphtory

Once installed, let’s set up the most bare bones PyRaphtory graph, test that we can add some data to it and run our first query. Once this is all working we can move on to some much more exciting examples in the next section!

Before we start, however, you may have noticed that this page looks oddly like a iPython notebook. That is because it is! If you click the open on github link in the top right of the page you can follow along on your own machine.

To do this we first we need to import PyRaphtory alongside some helper classes. The second import here is for the Row class we use in the select function to get out our query results.

You will see some references to Java in the logs here, this is because under the hood Raphtory is written in Scala. You don’t have to worry about any of that though as its all hidden away!

[1]:
from pyraphtory.context import PyRaphtory
from pyraphtory.graph import Row

Creating your first graph

Once Raphtory is installed we can create our first graph! To do this we first need a context which we can get from the PyRaphtory object.

Our two options here are local and remote. As we are just testing it on our laptops we can use local, meaning the Raphtory code will run within your python process. We will dig into remote contexts later when you want to deploy in a seperate process or scale your graph past what your laptop can handle.

Once we have our context we can call new_graph() to create a graph which we can add data into and run queries on.

[7]:
context = PyRaphtory.local()
graph = context.new_graph()
graph.graph_id()
[7]:
'colourful_wheat_albatross'

Adding data to your Graph

Once a graph is created, we need to add some data to it if we want run anything interesting. There are loads of ways of doing this in Raphtory, which we will cover in the next section, but for simplicity lets just add some vertices and edges without any properties.

As Raphtory is focused on dynamic and temporal analysis, all events in the graph’s history (adding, updating or deleting nodes/edges) must happen at a given time. This can all be at the same time (if, for example, you are working with snapshots) but we still need a time.

As such, when we add a vertex we have two arguments: the timestamp and the vertex ID. Simiarly, when adding an edge, we have three arguments: the timestamp, the source vertex and the destination vertex.

Note: All graphs are directed by default in Raphtory, but can be projected into an undirected graph - we will go indepth into graph projections later in the tutorial.

In the following code block we have five updates for our graph, adding three vertices (1,2,3) at time 1 and two edges (1->2, 1->3) at time 2 .

[3]:
graph.add_vertex(1, 1)
graph.add_vertex(1, 2)
graph.add_vertex(1, 3)
graph.add_edge(2, 1, 2)
graph.add_edge(2, 1, 3)

Running your first Query

Now that our data is loaded we can start interrogating it!

While we can write some very complicated algorithms in Raphtory, lets start off with something simple, getting the indegree and outdegree of our nodes.

For this we call select on the graph, which takes a function to run on every vertex. This will return a Table full of Rows which represent the result for each node. From this point we can either write our results to a Sink (file, database, etc.), which we will cover later in the tutorial, or convert it into a dataframe for further analysis.

In this example we have called to_df to get a dataframe, giving it a list of the variables we want to be included. If you are wondering where these variable names come from, we will be explaining very shortly so bear with us!

If you have a look in the logs you can see that your query is given a Job ID and Raphtory will report how long it took for it to run.

[4]:
df = graph \
.select(lambda vertex: Row(vertex.name(), vertex.out_degree(), vertex.in_degree())) \
.to_df(["name", "out_degree", "in_degree"])
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/Users/bensteer/miniconda3/envs/pyraph39/lib/python3.9/site-packages/pyraphtory_jvm/data/lib/compile/kryo-shaded.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

Checking out the output

Finally, once our query has run and we have got our dataframe, we can take a look at the results.

One aspect which is notable here is that we requested three variables, but we have five columns. This is because algorithms in Raphtory run at set points in time, meaning the values for each vertex must be associated with a timestamp (in this case the most recent one 2).

As with every other cool feature I have hinted at, you will soon be an expert in queries, windowing and much more. All you have to do is continue on to the next page!

[5]:
df
[5]:
timestamp name out_degree in_degree
0 2 1 2 0
1 2 2 0 1
2 2 3 0 1